| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Physics and
Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Correspondence: Address reprint requests to Leonid A. Mirny, Tel.: 617-452-4862; E-mail: leonid{at}mit.edu.
| ABSTRACT |
|---|
| INTRODUCTION |
|---|
110 s) search and recognition of the specific site (referred to as the target or cognate site below) out of 106109 possible sites on the DNA; and 2), stability of the protein-DNA complex (Kd = 1015108 M). Despite its apparent simplicity, such a mechanism is not understood in depth, either qualitatively or quantitatively. Here we focus on a simpler case of bacterial TFs recognizing their cognate (target) sites on the naked DNA. Needless to say, eukaryotic protein-DNA recognition is significantly complicated by chromatin packing of the DNA and the multisubunit structure of the TFs. Interestingly, similar problems of specific binding and binding rates arise in the context of oligonucleotides-DNA binding (Lomakin and Frank-Kamenetskii, 1998
Vast amounts of experimental data available these days provide the structures of protein-DNA complexes at atomic resolution in crystals and in solution (Luscombe et al., 2000
; Bell and Lewis, 2001
, 2000
; Lewis et al., 1996
; Schumacher et al., 1994
), binding constants for dozens of native and hundreds of mutated proteins (Takeda et al., 1989
; Grillo et al., 1999
), calorimetry measurements (Spolar and Record, 1994
), and novel single-molecule experiments (Shimamoto, 1999
). These experimental data contributed most significantly to our present understanding of protein-DNA interaction since the early work of von Hippel and co-workers. In a series of pioneering articles (Berg et al., 1981
; Winter et al., 1989
; von Hippel and Berg, 1989
; Berg and von Hippel, 1987
), they created a conceptual basis for describing both the kinetics and thermodynamics of protein-DNA interaction, which has since become a starting point for practically every subsequent theoretical work on the subject.
We start by reviewing the history of the problem and describing the paradox of the faster-than-diffusion association rate. Next, we present the classical model of protein-DNA sliding and explain how this model can resolve the paradox. We outline the problem that the sliding mechanism faces if the energetics of protein-DNA interactions are taken into account. Next we introduce our novel quantitative formalism and undertake an in-depth exploration of possible mechanisms of protein-DNA interaction.
Faster-than-diffusion search
The problem of how a protein finds its target site on DNA has a long history. In 1970, Riggs et al. (1970a
,b
) measured the association rate of LacI repressor and its operator on DNA as
1010 M1 s1. This astonishingly high rate (as compared to other biological binding rates) was shown to be much higher than the maximal rate achievable by three-dimensional diffusion. In fact, if a protein binds its site by three-dimensional diffusion, it has to hit the right site on the DNA within b = 0.34 nm. A shift by 0.34 nm would result in binding a site that is different from the native site by 1 bp. Such a site can be very different; e.g., GCGCAATT versus CGCAATTC. Using the Debye-Smoluchowski equation for the maximal rate of a bimolecular reaction (see e.g., Richter and Eigen, 1974
; Flyvbjerg et al., 2002
; Bruinsma, 2002
), with a protein diffusion coefficient of D3d
107 cm2 s1 (Elowitz et al., 1999
) we get
![]() | (1) |
To resolve the discrepancy between the experimentally measured rate of 1010 M1 s1 and the maximal rate of 108 M1 s1 allowed by diffusion, Richter and Eigen (1974)
, and later Berg et al. (1981)
and von Hippel and Berg (1989)
, suggested that the dimensionality of the problem changes during the search process. They concluded that, while searching for its target site, the protein periodically scans the DNA by sliding along it.
Sliding along the DNA
If a protein performs both three-dimensional and one-dimensional diffusion, then the total search process can be considered as a three-dimensional search followed by binding DNA and a round of one-dimensional diffusion. Upon dissociation from the DNA, the protein continues three-dimensional diffusion until it binds DNA in a different place, and so on. Some experimental evidence supports this search mechanism. These include affinity of the DNA-binding proteins for any fragment of DNA (nonspecific binding), single molecule experiments where one-dimensional diffusion has been observed and visualized, and numerous other experiments where the rate of specific binding to the target site has been significantly increased by lengthening nonspecific DNA surrounding the site (Kim et al., 1987
). What are the benefits and the mechanism of one-dimensional diffusion and what limits the search rate?
Here we address this question and consider possible search mechanisms that involve both one-dimensional and three-dimensional diffusion, where one-dimensional diffusion along the DNA proceeds along the rough energy landscape. Quantitative analysis of the search process brought us to the following four main results:
2 kBT, the diffusion along the DNA becomes extremely slow, with the protein unable to diffuse more than a few basepairs. The total search process is prohibitively slow.
2 kBT. Finally, we formulate this search-speed/stability paradox and suggest a search-and-fold mechanism that can resolve it. The paradox can be resolved if the DNA-binding protein has two distinct (conformational) states in which it exhibits two modes of binding. In the first, which is the mode that has weaker binding and a smoother landscape, it searches for its site. In the second (recognition) mode, which has larger roughness of the binding landscape, the protein tightly binds DNA sites. Correlation between the energy landscapes in the two modes and the energy difference and the barrier between the two protein conformations controls the frequency of transition between the two modes and provides effective preselection of low-energy sites.
We suggest that these modes correspond to two distinct conformational states of the protein-DNA complex (a relatively open complex in the search mode, and a tighter complex in the recognition mode). Transition between the two states can include partial folding of the protein, water extrusion, change in the DNA conformation, etc. Focusing on the conformation of the protein, and without loss of generality, we consider a partially unfolded (disordered) conformation and the folded conformation bound to the cognate site as the two conformations required by our model. In fact, a protein in the partially unfolded conformation may have fewer and/or weaker interactions with DNA allowing rapid sliding. Folded conformation, in turn, provides stronger and more specific interactions required for tight binding.
We also quantify the requirements of this two-mode mechanism to provide both rapid search and stability. Structures of known DNA-binding proteins are known to be flexible and have been reported to exhibit two or more distinct binding modes. This two-state mechanism also agrees well with the results of calorimetric experiments.
The proposed search-and-fold mechanism is not limited to the protein-DNA interaction; it also provides a general framework for protein-ligand binding and demonstrates the advantages of induced folding, a common theme in molecular recognition.
| THE MODEL |
|---|
1d,i, i = 1...N) separated by rounds of three-dimensional diffusion (
3d, i). The total search time ts is the sum of the times of individual search rounds,
![]() | (2) |
3d,i by its average
3d. Each round of one-dimensional diffusion scans a region of n sites (where n is drawn from some distribution p(n)). The time,
1d(n), that it takes to scan n sites can be obtained from the exact form of the one-dimensional diffusion law (see Appendix A). If, on average,
sites are scanned in each round, then the average number of such rounds required to find the site of length M on DNA is
Using average values, we get a total search time of
![]() | (3) |
is large for both very small and very large values of
In fact, if
is small, so few sites are scanned in each round of the one-dimensional search that a large number of such rounds (alternating with rounds of three-dimensional diffusion) are required to find the site. On the other hand, if
is large, lots of time is spent scanning a single stretch of DNA, making the search very redundant and inefficient. An optimal value,
should exist, which provides little redundancy of one-dimensional diffusion and a sufficiently small number of such rounds. For a given diffusion law
1d(n), function
can be minimized producing
the optimal length of DNA to be scanned between the association and the dissociation events. (Naturally, we assume here that
grows with
at least as
with
> 0.)
Protein-DNA energetics
While diffusing along DNA, a TF experiences the binding potential
of every site
it encounters. The energy of protein-DNA interactions is usually divided into two partsspecific and nonspecific (Berg and von Hippel, 1987
; Gerland et al., 2002
),
![]() | (4) |
describes a binding DNA sequence of length l. As its name suggests, the nonspecific binding energy Ens arises from interactions that do not depend on the DNA sequence that the TF is bound to, e.g., interactions with the phosphate backbone. The specific part of the interaction energy exhibits a very strong dependence on the actual nucleotide sequence. Here and below we use the term energy to refer to the change in the free energy related to binding
Gb. This free energy includes the entropic loss of translational and rotational degrees of freedom of the protein and amino acids' side chains, the entropic cost of water and ion extrusion from the DNA interface, the hydrophobic effect, etc.
The energy of specific protein-DNA interactions can be approximated by a weight matrix (also known as PSSM, or profile) where each nucleotide contributes independently to the binding energy (Berg and von Hippel, 1987
),
![]() | (5) |
(j, x) is the contribution of basepair x in position j. Most of the known weight matrices of TFs
(j, sj) give rise to uncorrelated energies of overlapping neighboring sites, obtained by one basepair shift (Gerland et al., 2002
U
and variance
2,
![]() | (6) |
|
each corresponding to a certain binding sequence comprising bases from the ith to the (i + l1)th, l being the length of the motif (see Fig. 2). At each site, there is a probability pi of hopping to site i + 1 and a probability qi of hopping to site i1. These probabilities depend on the specific binding energies, Ui and Ui±1, at the ith site and at the adjacent sites, respectively, and are proportional to the corresponding transition rates,
i,i+1 and
i,i1. For the latter, it is most natural to assume the regular activated transport form
![]() | (7) |
is the effective attempt frequency, ß
(kBT)1; kB is the Boltzmann constant; and T is the ambient temperature. Having defined that, we have a one-dimensional random walk with position-dependent hopping probabilities.
|
for protein sliding along the DNA given the sequence-dependent binding energy (Eq. 7). | RESULTS |
|---|
, of the binding energy landscape?
Diffusion along the DNA
We state here the main results without a derivation (which can be found in Appendix A). For a given set of probabilities {pi}, the mean first-passage time (MFPT) from i = 0 to i = L (in terms of number of steps) is (Murthy and Kehr, 1989
)
![]() | (8) |
i
qi/pi. The relation in Eq. 8 gives the MFPT for one given realization of probabilities. Assuming that the specific binding energies {Ui} have a normal distribution with variance
2 (see above), we plug the probabilities in Eq. 7 into Eq. 8 and after a somewhat lengthy but straightforward calculation, we obtain an expression for the MFPT averaged over genomic sequences for L >> 1,
![]() | (9) |
0 is the reciprocal of the effective attempt frequency for hopping to a neighboring site.
The main result is that the one-dimensional search by hopping to neighboring sites proceeds by normal diffusion with t
L2/2D1d, where the diffusion coefficient
![]() | (10) |
, dropping rapidly as
becomes greater than a few kBT (Slutsky et al., 2004
< 1.5). This requirement imposes strong constraints on the allowed energy of specific binding interactions.
Optimal time of three-dimensional/one-dimensional search
When one-dimensional scanning is combined with three-dimensional diffusion, what is the optimal time a protein has to spend in each of the two regimes? To answer this question we compute the optimal number of sites the protein has to scan by one-dimensional diffusion to get the fastest overall search. Results of this section are rather general and are not limited to the particular scenario of slow one-dimensional diffusion on a rough landscape discussed above.
Each time the protein binds DNA it performs a round of one-dimensional diffusion. If the round lasts
1d, then, on average, the protein scans
bps (Hughes, 1995
). By plugging this relation into Eq. 3 for search time ts, and minimizing ts with respect to
we get the optimal total search time and the optimal number of sites to be scanned in each round,
![]() | (11) |
First, and most importantly, we obtain that, in the optimal regime of search,
![]() | (12) |
More importantly this central result can be verified experimentally by either single-molecule techniques or by traditional methods.
Also note that the optimal region of the DNA scanned in a single round of one-dimensional diffusion
does not depend on Mi.e., is the same irrespective of the size of the genomes to be searched for a specific site.
Second, the optimal one-dimensional/three-dimensional combination reached at
1d =
3d leads to a significant speedup of the search process. In fact, an optimal one-dimensional/three-dimensional search is
times faster than a search by three-dimensional diffusion alone, and
times faster than a search by one-dimensional diffusion alone. For example, if the protein operates in the optimal one-dimensional/three-dimensional regime and scans
during each round of DNA binding, then the experimentally measured rate of binding to the specific site can be 100 times greater than the rate achievable by three-dimensional diffusion alone.
Third, we can estimate
the maximal number of sites a protein can scan in each round of one-dimensional search. If we set D1d to its maximum, i.e., D1d
D3d and
with lm
0.1 µm, we get
![]() | (13) |
D3d/100, we get
Again, single molecule experiments can provide estimates of these quantities for different conditions of diffusion.
Finally, we obtain estimates of the shortest possible total search time. If M
106 bp and one-dimensional diffusion is at its fastest rate, i.e., D1d
D3d = 107cm2/s, then using Eq. 11 we get
![]() | (14) |
One can also estimate the search time using in vitro experimentally measured binding rates in water,
(Riggs et al., 1970a
,b
). The diffusion coefficient of a protein in the cytoplasm is 10100 times lower than that in water, leading to the estimated binding rate of
(see Appendix D). From this we obtain the time it takes for one protein to bind one site in a cell of 1 µm3 volume (i.e., [TF]
109 M) as
![]() | (15) |
As we mentioned above, there are usually several TF molecules searching in parallel for the target site. Naturally, in this case, the search is sped up proportionally to the number of molecules.
Diffusion of PurR on the Escherichia coli genome
To check the applicability of the above considerations, we simulated one-dimensional diffusion of PurR transcription factor on the E. coli chromosome.
The specific energy profile was built using a weight matrix derived from 35 PurR binding sites following a standard procedure described elsewhere (Berg and von Hippel, 1987
; Stormo and Fields, 1998
). The resulting energy profile is random and uncorrelated and has a standard deviation
6.5 kBT. This profile was used as an input for calculating mean first passage time at different temperatures. (Since the magnitude of the interaction is fixed, in these calculations we vary temperature rather than binding strength.) The result of these calculations is presented in Fig. 3. It is clear that when the roughness of the landscape becomes significant at
> 2 kBT, the diffusion proceeds extremely slowly. Only
10100 bp can be scanned by a TF when
= 2 kBT. A natural requirement for sufficiently fast diffusion is, as before,
kBT.
|
), the nonspecific energy Ens makes a sensibly larger contribution to the total binding energy.
For a TF at rest bound to some DNA site i, the dissociation rate, ri, would be given by the Arrhenius-type relation,
![]() | (16) |
1d, a protein spends before dissociating from the DNA (see Appendix B). We obtain
![]() | (17) |
![]() | (18) |
The parameter space
Since for a given value of
, the nonspecific binding controls the dissociation rate, the search time will deviate from the optimum if Ens moves from this predetermined value. In Fig. 4 a we plot the search time as a function of the nonspecific binding energy for different values of
.
|
, as the ratio between the acceptable value of the search time, ts, and the optimal search time,
Experimental data suggest
5, but for the moment we allow for much larger values of
10100 (this can be done when, for instance, there are many protein molecules searching in parallel). As we can see from Fig. 4 a, for each value of
, there is a range of possible values of Ens such that the resulting search time is within the region of tolerance (see Appendix B). Note the dramatic increase in the search time as Ens deviates from its optimal value.
Specifying
, we can define our parameter space, i.e., the values of specific and nonspecific energy producing a total search time within the region of tolerance. In Fig. 4 b, we consider three values of
. The most relaxed requirement
= 100 provides a search time of ts
500 s. If 100 proteins are searching for a single site, then the first one will find it after
5 sleading, however, to a fairly low binding rate of kon
1/500 s · 109 M1 = 2 · 106 M1 s1 (compared to experimentally measured 1010 M1 s1 in water). Importantly, to comply with even this most relaxed search time requirement, the characteristic strength of specific interaction must be
2.3 kBT.
These results bring us to a very important conclusion that a protein cannot find its site in biologically relevant time if the roughness of the specific binding landscape is
2 kBT. Although an optimal one-dimensional/three-dimensional combination can speed up the search, it cannot overcome the slowdown of one-dimensional diffusion. Only fairly smooth landscapes (
1 kBT) can be effectively navigated by proteins.
Speed versus stability
Whereas rapid search requires fairly smooth landscapes (
1 kBT), stability of the protein-DNA complex, in turn, requires a low energy of the target site (Umin < 15 kBT for a genome of 106 bp).
In Fig. 5 a, we present the equilibrium probability Pb of binding the strongest target site with energy Umin = U0 (Gerland et al., 2002
) as a function of
/kBT. In equilibrium, Pb equals the fraction of time the protein spends at the target site,
![]() | (19) |
>> kBT.
|
/kBT. High roughness of
>> kBT required for stability of the protein-DNA complex leads to astronomically large search times. In contrast, a protein can effectively search the target site at
< 12 kBT. This brings us to the central result that the ability to translocate rapidly along the DNA clearly cannot comply with the stability requirement.
Requirement of high stability at the target site, Pb
1 (or Pb
1/Np, if Np copies of the protein are present), yields an estimate for the minimal
of
![]() | (20) |
From the above analysis, an obvious conflict arises: the same energy landscape cannot allow for both rapid translocation and high stability of states formed at sites with the lowest energy. This conflict is similar to the speed-stability paradox of protein folding formulated by Gutin et al. (1998)
: rapid search in conformation space requires a smooth energy landscape, but then the native state is unstable. In protein folding, this conflict is resolved by the presence of a large energy gap between the native state and the rest of the conformations (Finkelstein and Ptitsyn, 2002
; Pande et al., 2000
).
As evident from Fig. 1, no such energy gap separates cognate sites from the bulk of other (random) sites. In fact, the energy function in the form of Eq. 5 cannot, in principle, provide a significant energy gap. Increasing the number of TFs cannot resolve the paradox either (see Appendices D and E). An alternative solution must be sought.
The two-mode model
The search-speed stability paradox has already been qualitatively anticipated by Winter et al. (1989)
, who therefore concluded that a conformational change of some sort should exist that would allow fast switching between the specific and the nonspecific modes of binding. In the nonspecific mode, the protein is sliding over an essentially equipotential surface (in our terms,
non-spec = 0), whereas site-binding takes place in the specific mode (
spec >> kBT). A protein in the nonspecific binding mode is "unaware" of the DNA sequence it is bound to. Thus, it should permanently alternate between the binding modes, probing the underlying sites for specificity.
This model naturally raises a question about the nature of the conformation change. Originally, it was described as a microscopic binding of the protein to the DNA accompanied by water and ion extrusion. However, numerous calorimetry measurements and calculations (Spolar and Record, 1994
) show that such a transition is usually accompanied by a large heat capacity change
C. This
C cannot be accounted for, unless additional degrees of freedom, namely, protein folding, are taken into account. On-site folding of the transcription factor may involve significant structural change (Flyvbjerg et al., 2002
; Bruinsma, 2002
; Kalodimos et al., 2004
) and take a time of
104106 s (Akke, 2002
) (compared to a characteristic on-site time of
0
107108 s). We conclude that conformational transition between the two modes involves (but is not limited to) partial folding of the TF.
If the TF is to probe every site for specificity in this fashion, it would take hours to locate the native site. We note, however, that if there was a way to probe only a very limited set of sites, i.e., only those having high potential for specificity, the search time would be dramatically reduced. From the previous section it is clear that a relatively weak site-specific interaction (i.e., smooth landscape,
kBT) does not significantly affect the diffusive properties of the DNA and the total search time. If this landscape, however, is correlated with the actual specific binding energy landscape (with
56 kBT), the specific sites will be the strongest sites in both modes. The protein conformational changes should occur therefore mainly at these sites, which constitute traps in the smooth landscape. Since such sites constitute a very small fraction of the total number of sites, the transitions between the modes are very rare.
We therefore suggest that there are two modes of protein-DNA binding: the search mode and the recognition mode (Fig. 6). In the search mode, the protein conformation is such that it allows only a relatively weak site-specific interaction (
s
1.02.0 kBT) (Fig. 6, top). In the recognition mode, the protein is in its final conformation and interacts very strongly (
r
5 kBT) with the DNA (Fig. 6, bottom). If two energy profiles are strongly correlated, then the lowest-lying energy levels (i.e., traps) in the search mode (
5 kBT) are likely to correspond to the strongest sites in the recognition mode (putatively, the cognate sites). The transitions between the two modes happen mainly when the protein is trapped at a low-energy site of the search landscape. In this fashion, the one-dimensional diffusion coefficient D1d is
10100 times smaller than the ideal limit, but the search time in the optimal regime is reduced only by a factor of
310 (Eq. 11).
|
The protein conformation in the recognition mode should be stabilized by additional protein-DNA interactions. If these interactions are unfavorable, the folded structure is destabilized; the search conformation is then rapidly restored and the diffusion proceeds as before. If the new interactions are favorable, however, the folded structure is stable and the protein is trapped at the site for a very long time.
For this mechanism to work, transition between the two modes of search has to be associated with a significant change in the free energy (
5..10 kBT) of the protein-DNA complex (see Fig. 6 c). Such an energy difference between the two states is required to make the majority of the high-energy sites in the recognition mode less favorable than in the search mode. A protein would rather (partially) unfold than bind an unfavorable site. As a result, sites that lay higher in energy than a certain cutoff exhibit a similar nonspecific binding energy (i.e., there is a switch into the search mode of binding). The folding of partially disordered protein loops or helices can provide the required free energy difference between the two modes.
Efficiency of the proposed search-and-fold mechanism depends on the energy difference between the two modes, correlation between the energy profiles, and the barrier between the two states. The barrier determines the rate of partial folding-unfolding transition. If the barrier is too low, then the protein equilibrates while on a single site, having no effect on search kinetics. On the contrary, too high a barrier can lead to rear folding events and the cognate site can be missed. It can be shown that having a barrier of proper size provides for an efficient search and stable protein-DNA complexes. Alternatively, the cognate site can lower the barrier by stabilizing the transition state (i.e., the folding nucleus; see Abkevich et al., 1994
; Mirny and Shakhnovich, 2001
), whereby it acts as a catalyst of partial folding. (Quantitative analysis of these factors is beyond the scope of this study, and will be published elsewhere.)
| DISCUSSION |
|---|
In contrast to kinetic proofreading that increases equilibrium specificity for the price of energy consumption, the search-and-fold model does not require any additional source of energy. The two-mode search-and-fold model provides a faster on-rate of binding while keeping the equilibrium binding constant unchanged. Naturally, the off-rate is increased as well. This makes our two-mode model thermodynamically neutral.
Coupling of folding and binding in molecular recognition
Several DNA- and ligand-binding proteins are known to have partially unfolded (disordered) structures in the unbound state. The unstructured regions fold upon binding to the target. Does binding-induced folding provide any biological advantage?
The idea of coupling between local folding and site binding has been around for some time and was recently reassessed in the much broader context of intrinsically unstructured proteins (Wright and Dyson, 1999
; Dyson and Wright, 2002
; Uversky, 2002
). Induced folding of these proteins can have several biological advantages. First, flexible unstructured domains have an intrinsic plasticity that allows them to accommodate targets/ligands of various sizes and shapes; and second, free energy of binding is required for compenstation for the entropic cost of ordering of the unstructured region. A poor ligand that does not provide enough binding free energy cannot induce folding and, hence, cannot form a stable complex. Williams et al. (2001)
have suggested that unstructured domains can be the result of evolutionary selection that acts on the bound (structured) conformation, while ignoring the unbound (unstructured) conformation. Partial unfolding can also increase protein's radius of gyration and, hence, increase the binding rate (Shoemaker et al., 2000
; Levy et al., 2004
).
Here we propose a mechanism that suggests the role of induced folding in providing rapid and specific binding. Induced folding (or any sort of two-state conformational transition) allows a protein to search and recognize DNA in two different conformations providing rapid binding to the target site. Importantly, this mechanism reconciles rapid search for the target site with a stable bound complex (see above). The rate of induced folding can also play a role in determining the specificity of recognition (M. Slutsky and L.A. Mirny, unpublished).
Structural and thermodynamic data argue in favor of distinct protein conformations for search along noncognate DNA and for recognition of the target site. Proteins such as
cI, EcoRV, and GCN4 apparently do not fold their unstructured regions while bound to noncognate DNA (Winkler et al., 1993
; Clarke et al., 1991
; O'Neil et al., 1990
); this supports our hypothesis.
Heat capacity measurements on a vast variety of protein-DNA complexes report a large negative heat capacity change in site-specific recognition, which is a clear indication of a phase transition. These measurements supplemented by x-ray crystallography and NMR structural data were interpreted by Spolar and Record (1994)
, mainly in terms of hydrophobic and conformational contributions to entropy. Thus, folding-binding coupling is now considered a well-established effect for a large set of transcription factors.
However, real-time kinetic measurements were not performed until recently, so that the question of the actual mechanism was left open. Serious advances in this direction were made by Kalodimos et al. (2001
, 2002
, 2004
), who observed a two-step site recognition by dimeric Lac repressor. The H/D-exchange NMR data unambiguously demonstrates site preselection by
-helices bound in the major groove followed by folding of hinge helices that bind to the minor groove elements and complete the specific site recognition. Although the experiments in this field were performed with a single model system, their implications are likely to have a general character.
It should be mentioned that no transition of this kind is observed when the protein is unbound from DNA. A possible reason for this can be a significant reduction of the free energy barrier for folding, entropic in essence, that accompanies protein-DNA association. Entropy barrier reduction is a natural consequence of relative anchoring of the various parts of the protein on the DNA scaffold. Thermal fluctuations that the associated protein is subject to are generally of the order of
kBT, and their main effect is protein translocation along the DNA. From the above analysis, it follows that the translocation actually takes place only if the protein encounters barriers of
s
kBT on its way. In a large enough collection of sites (M >> 1), however, potential wells of depth 
s
will be present. If the well depth is larger than the folding barrier height, the probability of on-site (in-well) folding increases, leading eventually to a stable complex formation. (More detailed computational analysis of coupling between folding and binding will be published elsewhere.)
Biological implications
The mechanism of three-dimensional/one-dimensional search described above has several biological implications. The studied model, as with any quantitative model, is, of course, a gross simplification of protein-DNA recognition in vivo. Despite this simplification, proposed mechanism can be generalized to describe the in vivo binding. Here we briefly discuss some of the biological implications of our model.
Simultaneous search by several proteins
If several TFs are searching for its site on the DNA, the total search time is given by Eq. 15 and is obviously shorter than the time for a single TF. For example, if 100 copies of a TF are searching in parallel for the cognate site, then assuming
and a cell of 1 µm3 volume, we obtain the search time of ts
0.1 s. Increasing the number of TF molecules can further decrease the search time, but can have harmful effects due to molecular crowding in the cell. Note, however, that increasing the number of TF molecules to 1001000 per cell cannot resolve the speed-stability paradox (see Fig. 5).
Search inside a cell: molecular crowding on DNA and chromatin
Above we assumed that a TF is free to slide along the DNA. The in vivo picture is complicated by other proteins and protein complexes (nucleosomes, polymerases, etc.) that are bound to DNA, preventing a TF from sliding freely along the DNA. What are the effects of such molecular crowding on the search time?
Our model suggests that molecular crowding on DNA will have little effect on the search time if certain conditions are satisfied. Obviously, the cognate shall not be screened by other DNA-bound molecules/nucleosomes. DNA-bound molecules can interfere with the search process by shortening regions of DNA scanned on each round of one-dimensional diffusion. If, however, the distance between DNA-bound molecules/nucleosomes in the vicinity of the cognate site is greater than
(see Eq. 13 and Kim et al., 1987
), then obstacles on the DNA do not shorten the rounds of one-dimensional diffusion and, hence, do not slow down the search process. Our analysis also suggests that sequestration of part of genomic DNA by nucleosomes can even speed up the search process.
If DNA-bound proteins are separated by >300500 bp, E. coli genomic DNA can accommodate 4.6 x 106 bp/300 bp
1.5 x 104 proteins. In other words, all 150 known and predicted E. coli TFs can be simultaneously present in 100 copies each, and search for their cognate sites without affecting one other (in fact, they can be present in 200 copies each, since optimal search requires 50% of proteins to be in solution at any one time). On the other hand, a short
50-bp linker between nucleosomes in eukaryotic chromatin can increase the search time
10-fold. Details of this analysis will be published elsewhere.
Funnels, local organization of sites
Several known bacterial and eukaryotic sites tend to cluster together. One may suggest that such clustering or other local arrangement of the sites can create a funnel in the binding energy landscape, which leads to a more rapid binding of cognate sites. Our model suggests that even if such funnels do exist, they would not significantly speed up the search process. The proposed search mechanism involves
rounds of one-dimensional/three-dimensional diffusions. So a TF spends all the search time far from the cognate site. Only the last round (out of 104) will be sped up by the funnel, leading to no significant decrease of the search time.
Local organization of sites and other sequence-dependent properties of the DNA structure (flexibility of AT-rich regions, DNA curvature on poly-A tracks, etc.) may influence preferred localization of TFs and lead to faster on-/off-binding rates and fast equilibration on neighboring sites (see Slutsky et al., 2004
, for details).
Protein hopping: intersegment transfer
Our model assumed that rounds of one-dimensional diffusion are separated by periods of three-dimensional diffusion. Intersegment transfer is another mechanism that can be involved. If two segments of DNA come close to each other, a TF sliding along one segment can hop to another. The benefit of this mechanism is that it significantly shortens the transfer time,
3D. Several examples of experimental evidence suggest that tetrameric LacI, which has two DNA-binding sites, travels along DNA through one-dimensional diffusion and intersegment transfer.
We did not consider this mechanism because of the two following considerations. First, it is unclear whether TFs that have only one binding site can perform intersegment transfer; and second, for this mechanism to work, distant segments of DNA need to come close to each other. Although DNA packed into a cell/nuclear volume crosses itself every
500 bp, DNA in solution, at in vitro concentrations, is unlikely to have any such self-crossings. Hence intersegment transfer cannot explain the faster-than-diffusion binding rates observed in vitro. This mechanism, however, may play a role in vivo, especially for proteins that have multiple DNA-binding sites.
Proposed experiments
Our results propose several experimentally testable predictions.
First, we predict that the maximal rate of binding is achieved when the protein spends half of the time in solution and half sliding along the DNA. This result can be readily verified experimentally by measuring the concentration of free protein in solution that contains DNA but no cognate site. We also show how the search time depends on the energy of nonspecific binding, which, in turn, can be controlled by the ionic strength of solution or by engineering proteins with stronger or weaker nonspecific binding. In vivo observation of the 50/50 rule would suggest that proteins are optimized by evolution for rapid search.
Second, we show how the binding rate depends on the average travel time between two random segments of DNA,
3d. This time measurement (
3d) depends on the DNA concentration and the domain organization of DNA. By changing DNA concentration and/or DNA stretching in a single molecule experiment, one can alter
3d and thus study the role of DNA packing on the rate of binding. This effect has implications for DNA recognition in vivo, where DNA is organized into domains. Similarly, one can experimentally measure and compare the binding rate, in the presence of other DNA-binding proteins or nucleosomes, with analytical predictions.
Single molecule experiments and AFM/SFM imaging allow direct observation of protein trajectory and measurement of the one-dimensional diffusion coefficient, D1d, on noncognate DNA. Our formalism, in turn, allows us to calculate the spectrum of specific binding energy, given D1d. Such measurements can be direct tests of our conjecture that one-dimensional search along noncognate DNA proceeds along a smoother energy profile.
Third, using protein engineering one can stabilize unstructured regions of DNA-binding proteins (e.g.,
cI, EcoRV, and GCN4), and study the binding rates of these engineered, rigid proteins. Such experiments can test the proposed search-and-fold mechanism and shed light on the role of unstructured regions in determining stability, specificity, and binding rates.
We also suggest that proteins bound to noncognate DNA are not fully ordered. Unfortunately very few studies (Kalodimos et al., 2001
, 2002
, 2004
) have addressed the mechanisms of binding to noncognate DNA. More studies of structures, thermodynamics, and dynamics of proteins bound to noncognate DNA will deepen our understanding of specific protein-DNA recognition.
| CONCLUSIONS |
|---|
The proposed mechanism has several important biological implications in explaining how a protein can find its site on DNA, in vivo, in the presence of other proteins and nucleosomes and by a simultaneous search of several proteins. Our model provides, for the first time, a quantitative framework for analysis of the kinetics of transcription factor binding and, hence, gene expression. Importantly, our model links molecular properties of transcription factors to the timing of transcription activation. Proper understanding of the entire mechanism will hardly be possible without further experimental effort in these directions.
| APPENDIX A: DIFFUSIVE PROPERTIES OF THE DNA |
|---|
105106), this approach is legitimateas is also confirmed by numerical simulations.
The MFPT
To derive the diffusion law, we calculate the mean first passage time (MFPT) from site 0 to site L, defined as the mean number of steps the particle has to make to reach the site L for the first time. The derivation here follows that in Murthy and Kehr (1989)
.
Let Pi,j(n) denote the probability to start at site i and reach the site j in exactly n steps. Then, for example,
![]() | (21) |
![]() | (22) |
We now introduce generating functions
![]() | (23) |
One can easily show (see e.g., Goldhirsh and Gefen, 1986
) that
![]() | (24) |
Knowing
one calculates the MFPT straightforwardly as
![]() | (25) |
Using Eqs. 21 and 22, we obtain the recursion relation for
![]() | (26) |
To solve for
we must introduce boundary conditions. Let p0 = 1, q0 = 0, which is equivalent to introducing a reflecting wall at i = 0. This boundary condition clearly influences the solution for short times and distances. However, as numerical simulations and general considerations suggest, its influence relaxes quite fast, so that for longer times, the result is clearly independent of the boundary. The benefit of setting p0 = 1 becomes clear when we observe that
![]() | (27) |
![]() | (28) |
The recursion relation for
is readily obtained from Eq. 26,
![]() | (29) |
i
pi/qi. Thus, the expression for
is obtained in closed form as
![]() | (30) |
This solution expression gives MFPT in terms of a given realization of disorder producing a certain set of probabilities {pi}, wherein we are interested in the behavior averaged over all realizations of disorder. The cumulative products in Eq. 30 reduce to the two forms of
which, after being averaged over uncorrelated Gaussian disorder, produces a factor of
After the summations are carried out, the expression for MFPT becomes for L >> 1,
![]() |