Thermodynamic measurements of proteins indicate that the
folding to the native state takes place either through stable
intermediates or through a two-state process without intermediates. The
rather short folding times of proteins indicate that folding is guided through some sequence of contact bindings. We discuss the possibility of reconciling a two-state folding event with a sequential folding process in a schematic model of protein folding. We propose a new
dynamical transition temperature that is lower than the temperature at
which proteins in equilibrium unfold. This is in qualitative agreement
with observations of in vivo protein folding activity quantified by
chaperone concentration in Escherichia coli. Finally, we
discuss our framework in connection with the unfolding of proteins at
low temperatures.
 |
INTRODUCTION |
Proteins appear to fold into a unique native
conformation, in spite of an astronomical number of alternative
configurations. This apparent paradox, usually attributed to Levinthal
(1968)
, is further sharpened in view of the fact that there is
experimental evidence that the folding transition behave nearly like a
two-state system for many single-domain proteins (Privalov and
Khechinasvili, 1974
; Creighton, 1992
; Baldwin and Rose, 1999a
,b
). This
means that for these proteins, the transition from denatured to native state occurs rather directly, without observed intermediates. One would
think that such a two-state behavior would exclude the possibility of
guiding the protein to the native state. The purpose of this paper is
to quantify the degree of guiding that is compatible with the observed
two-state folding process. We do this through generalizing a
hierarchical protein model introduced earlier (Hansen et al., 1998
). In
this model we parameterize the folding process through an ordered
series of binding events, and thereby obtain a first-order
folding-unfolding process. However, as intermediates will be associated
to guiding the folding, the original model does not give a two-state
folding transition.
The folding of proteins can be addressed experimentally by
thermodynamic quantities such as entropy, enthalpy (H), and
heat capacity (C = dH/dT) as
functions of temperature. One characterizes the folding transition with
the released energy, i.e., the latent heat (Q), and the peak
height of the heat capacity (
C) at the transition
temperature Tc.
Tc is defined as the temperature at which the protein has equal free energies in the native (folded) and
denaturated (unfolded) states. The van't Hoff relation (Privalov, 1979
),
|
(1)
|
provides a powerful way to quantify the sharpness of a smoothed
out first-order phase transition taking place at
Tc. It relates the enthalpy difference
between the two phases,
H, to the height of the heat
capacity peak,
C, and latent heat of the transition, Q, which is the same as
H, i.e.,
H = Q.
is a dimensionless proportionality factor and k is the Boltzmann constant. For
a given
H and Q, then, the value of
is
inversely proportional to
C; in this respect, a smaller
corresponds to a sharper transition.
When the transition is two-state it is known that
= 4 (Privalov, 1979
). We will also show that when the transition has a large number of equally stable intermediates, then
= 12. For the single-domain proteins, ribonuclease, lysozyme, chymotrypsin, cytochrome c, and myoglobin, Privalov and Khechinasvili (1974)
find
experimentally
|
(2)
|
to within 5% accuracy, demonstrating that these transitions are
very nearly two-state.
Protein folding can be described on a number of different levels. On a
microscopic level it is governed by molecular forces between amino
acids and between amino acids and the surrounding water. On a large
scale one may characterize the folding by a number of binding events
that each limits the residual conformational entropy. The ordering of
these binding events is at present unknown, although recent
experimental studies suggest some sort of hierarchical ordering in the
folding process (Baldwin and Rose, 1999a
,b
; Nolting et al., 1997
;
Chakraborty and Pang, 2000
). This may contrast somewhat to protein
folding as a two-state process. In this paper we explore the the
possibility for reconciling a two-state thermodynamics with a guided
folding process. As a simple guiding principle, we adopt the sequential
zipper-like (Schellman, 1958
; Dill et al., 1993
) description of the
process (Hansen et al., 1998
). In contrast to geometrical zipper models
implemented for, e.g., DNA melting, we here can also view the zipper as
an effective description of a unique folding pathway, i.e., an
hierarchical ordered sequence of binding events between different parts
of the protein (Hansen et al., 1998
).
 |
THE MODEL |
We sketch the model and its parameterization in the following. One
may visualize each binding event as closing of a specific pair contact
between two residues. Each of these events is characterized by binary
variable
i that indicates whether it is closed
(
i = 1) or open (
i = 0). The overall folding state of the protein is thus characterized by
the set of binary variables
1,
2, ...
N, where
the native state is the one where all
i = 1. There is experimental evidence that protein folding happens through a
fairly specific pathway, i.e., that there is an ordering of binding
events leading to the native state (Nolting et al., 1997
; Chakraborty
and Pang, 2000
; Huang et al., 1999
). Mathematically, the existence of a
specific pathway is implemented by the series of inequalities
|
(3)
|
The variables
i are insufficient to
describe the degrees of freedom for the protein. In order to take these
into account, we introduce a second independent set of variables,
i, which describes the degrees of freedom
associated with the unfolded parts of the protein. In principle, these
will have a range of possible values, analogous to the about various
possible values of the dihedral angles of the protein (Creighton,
1993
). However, for simplicity, we then assign only two values,
0 or
0
E, to each
i. The Hamiltonian is
|
(4)
|
with the constraints in Eq. 3 incorporated on the
i values, implying that when
i = 0 all terms with j
i possess no energy. The interpretation of the terms in this
Hamiltonian is that when a local binding is intact,
i = 1, there is an energy cost of E
to change the
i variable from the value

0 to 
0 + E. When there is no binding, that is,
i = 0, there is no energy cost associated with
changing
i; it "flaps" freely. We stress
that we have simplified the conformation space here to only two states, with energy 
0 and

0 + E, per variable
i of the polypeptide. In reality already the
individual amino acids will have more dihedral angles to choose from,
and the true energy spectra will presumably have one lowest energy
state and a number of higher energy states that become accessible when
the structure flaps freely.
We note that for any finite value of E, the protein may
change structure locally due to change in
i
even in the parts of the protein where
i = 1. This would then reflect an unfolding event inside a protein. In order
to simplify the analysis we assume E to be sufficiently
large compared to any other energy scale in the system
in particular
T, where T is the temperature
so that the
i variables never take the value
0
E when
i = 1. Hence, in our model no unfolding can
occur inside an already folded part of the protein. We have put the
Boltzmann constant equal to unity or absorbed it into the temperature
for simplicity.
We may define a set of binary, unconstrained variables
i, taking the values 0 or 1 such that
|
(5)
|
In particular,
1 =
1. In the limit when E
, the
Hamiltonian (4) becomes
|
(6)
|
where there are no additional constraints. The role of the
variables
i is now played by the degeneracy
present in Eq. 6, as one
i = 0 implies that
p1 is independent of all subsequent
j variables (j > i). If i labels the first variable where
i = 0, then
p1 =
(i
1)
0, and the number of
degenerate states at this energy is
2N
i, reflecting the
residual degrees of freedom. This is because variable i is
open, and the rest of the N
i variables
access two possible degenerate states according to the previous
discussion about the dihedral angles. This allows an exact calculation
of the partition sum of the system, by summing over the number
i where first
i = 0:
|
(7)
|
which rapidly changes or have a smoothed out singularity at
= 1/T = ln 2/
0
corresponding to a first order transition at Tc =
0/ln 2.
We stress that the number 2 in the above degeneracy count is an
artifact of assuming that each variable has only two possible states in
the unfolded state. The real degeneracy count can have a different
degeneracy factor.
At the transition the ordered, fully folded state
{
i} = {1111···1} has an
energy U = 
ln Z/
=
N
0. Thus
H = Q = N
0 and
C =
U/
T = N2(ln 2)2/12 (at
Tc) unfolds to a disordered structure with
energy U = 0, leading to
= 12 by use of Eq. 1.
This corresponds to a situation where there are many intermediate
states of the same free energy. This will smooth out the
transition and result in a broader peak in the heat capacity. On the
other hand, we may consider only a rescaled last term
|
(8)
|
such that the partition function becomes Z = 2N
1 + e
N
0.
Then one also obtains a sharp phase transition at
Tc =
0/ln 2, with
H = Q = N
0 but with
C = (N
0ln 2)2/4.
Using Eq. 1, this will lead to a value
= 4, as expected because this is a description of a classical two-state system (Privalov
and Khechinasvili, 1974
). The Hamiltonian in Eq. 8 describes a
situation in which the system only lowers energy when all contacts are
closed, and meaning that the protein is in the unique native state.
There is no guiding in the Hamiltonian in Eq. 8, since the ground
state, {1111···111}, is one out of the
2N possible states, whereas all the other
2N
1 states are degenerate. Thus the time to
find the ground state for such a two-state system will be very long
when simulating by Monte Carlo, as we will now discuss.
 |
RESULTS AND DISCUSSION |
We define time in the model based on the Monte Carlo Metropolis
(MC) method (Binder, 1987
). The values of
i
are chosen or changed randomly, and acceptance of each choice depends
upon the usual Boltzmann factor due to any energy shift connected to
this. Time advances by one unit for every attempted update of one of the
i variables. We note that in principle the
dynamics of an MC procedure is different from the actual dynamics of a
given Hamiltonian, although properties at thermal equilibrium are
properly represented. However, if time scales associated to different
i variables are not too different from each
other, the MC simulation may reflect the overall dynamical behavior.
We measure the average folding time as the typical number of states
visited before finding the ground state. This time is widely different
between the guided in Eq. 6, and the two-state model in Eq. 8. For the
true two-state model the average folding time is
2N/2. This is because no variable will be fixed
at 1 before all variables are 1, thus making a probability of
1/2N of reaching the ground state at each time
step, irrespective of what the previous state was. Thus, the two-state
system indeed takes exponential times to fold, thus confirming the
Levinthal paradox of astronomical folding times for unguided protein folding.
For the guided system governed by Eq. 6, the ground state is found in a
time growing as N2, as in a diffusion
process, when T is below
Tc. This reflects that at each time
step only one variable can be fixed at the value
i = 1, the one where the previous vaiable
equals 1 (i.e.,
i
1 = 1). Attempts to change other variables will either be energetically disfavored (for j < i) or likely be
subjected to reversals at later stages because these conformational
changes are not associated with any energy changes. When each time step
allows one variable to possibly change value, it typically takes
N time step to fix the next
on the pathway. Summed over
all subsequent variables, this gives an overall folding time scaling as
N2. The exact prefactor to this
folding time depends on temperature, as increased temperature enhance
the probability that an already folded variable unfolds (1
0) again.
To reconcile that a large class of proteins behaves as a two-state
system with the necessity of being able to reach the ground state in a
reasonable time, we now study a combination of the two Hamiltonians in
Eqs. 1 and 4:
|
(9)
|
p
[0, 1], is a dimensionless
parameter that weighs the contributions from the Hamiltonians
p1 and
p2. This
Hamiltonian has a transition at Tc =
p
0/ln2 as shown
below. We can define a partial free energy F(n),
where n + 1 = i according to i in Eq. 1. Consequently, F(n) is nth term
in the partition sum (n
{0..N}). The
partition function becomes
|
(10)
|
For a given temperature the partial free energy of states is
F(n
N
1) = n(Tln2
p
0)
T(N
1)ln2, and
F(N) =
N
0.
In Fig. 1 we show
F(n) schematically for different temperatures
T, where we set
0 = 1 here and in
the following discussion. Each F(n) exhibits a
jump at n = N corresponding to the free
energy gain N(1
p) +
p for reaching the ground state. At low
T, F(n) is monotonically decreasing,
reflecting a fast folding kinetics where the typical folding time grows
as N2. At an intermediate
T = TG =
p/ln 2 all n < N
are equally probable. Below this temperature guiding becomes important.
Also, TG is lower than the
folding-unfolding transition temperature
Tc where the denaturated state becomes
thermodynamically favored. For T in the interval between
TG and
Tc the intermediate states are unstable (see Fig. 1)
i.e., they form a
barrier between the folded and denatured state
and the folding time
scale exponentially with both T and N. At a
higher T = Tc = 1/ln 2 the folded state becomes unstable, and the protein unfolds
(
n
0). The fact that the free energy landscape
changes with T means effectively that two-state folding
around Tc is compatible with
guiding and fast folding at low T.

View larger version (13K):
[in this window]
[in a new window]
|
FIGURE 1
A schematic drawing of the partial Gibbs free energy
F(n) defined through Eq. 1 as a function of the level of
folding n for for three different temperatures
T. F(0) is rescaled to 0.
|
|
Fig. 2 shows the van't Hoff coefficient
as a function of
p on the unit interval based on direct
calculation of the partition function. One observes that increasing
p
i.e., increasing the guiding
leads to
increasing
and thus to a softening of the transition. As
N is increased, the regime where
is very close to 4 is
expanded toward higher values of
p. For
example, with the experimental observation of
= 4.2, and
assuming N = 10,
p is close to
zero, whereas for N = 100,
p
is approximately 0.7. Thus, in this latter case, 70% of the energy
difference between the unfolded and folded states sits in the guiding,
i.e., comes from
p
p1 and
still
is very close to the value, indicating the folding process to be essentially a two-state process.
We now discuss the fact that large N allows for more
guiding, i.e., larger
p, without destroying
the two-state nature of the transition. To understand this we note that
any
p < 1 in fact define a virtual phase
transition at T = TG < Tc. At
TG the protein would unfold if it were
not due to the additional gain in binding energy when the ground state
is reached. This virtual transition is not seen directly in equilibrium
thermodynamics, but strongly influences the MC dynamic behavior in the
temperature range between TG and
Tc. In this intermediate regime
the protein is a two-state system due to the appearance of a free
energy barrier (see Fig. 1). In order to cross this barrier, a large
thermal fluctuation is needed. Such a fluctuation is rare and hence the folding time will be long. When the system finally is folded, it will
stay so for T < Tc if
N is large enough, and as a consequence
= 4 for the
real transition. However, for systems with small N, it may
unfold again due to thermal fluctuations that take it across the
barrier in the opposite direction. Once out of the folded state, it
will linger on the "wrong" side of the barrier, where it
essentially only sees the Hamiltonian
Hp1, which gives
= 12.
Experimentally, if one is dependent on dynamics, one presumably
measures TG as the transition
temperature, whereas for experiments based on thermodynamics it would
be Tc. For fast living organisms such
as Escherichia coli the overall status of fraction of
unfolded proteins can be monitored by the level of chaperone DnaK
(Alberts et al., 1994
; Arnvig et al., 2000
). By means of energy input
from ATP, unfolded proteins are produced in vivo. In a living cell these are thermodynamically unstable and want to fold. The speed of the
folding process is increased or catalyzed by chaperones. For
temperatures between 13 and 37°C the DnaK per E. coli cell raises slowly from 4000 to 6000, whereafter it rises
sharply to ~8500 at 42°C and ~18,000 at 46°C (Herendeen et al.,
1979
; Pedersen et al., 1978
). At 50°C the E. coli dies.
This may be taken as an indication that in the temperature interval
above 37°C, the typical proteins need help in the folding process.
But as the cell is able to sustain life up to about 50°C, the typical
proteins must have some stability up to this higher temperature. This
resembles the behavior of our model, with a
TG of about 37°C, an exponentially slow folding of proteins necessitating the help of chaperones at higher
temperatures, and a Tc of the order of
50°C (Arnvig et al., 2000
).
The above considerations can be extended to include a more realistic
scenario in which the protein is reacting with water. Following
Hansen et al. (1998)
, we parameterize this through water variables
w1,
w2, ... ,
wN, taking values
min + s
, s = 0, 1, ... , g
1. Here,
is the spacing of the
energy levels of the water-protein interactions. We quantify the
coupling to the water by a combination of the Hamiltonians
|
(11)
|
and
|
(12)
|
to form the total Hamiltonian
|
(13)
|
The dimensionless parameter
w
[0,1] measures the
contributions from the Hamiltonians
w1 and
w2, while
p is the
same parameter defined in Eq. 5. (Here it may be noted that
w2 will introduce non-local interactions
between distant units, when the terms are interpreted using the
variables
i and
i.)
When
p =
w = 1 we are
back to the Hamiltonian defined by Hansen et al. (1998)
whereas when
p =
w = 0 we are
facing a two-state Hamiltonian. In Fig. 3 we display the heat capacity
curves for these two extremes. The system is folded in its ground state
between the cold unfolding transition at T = 1.2 and
the hot unfolding transition at T = 4.7. As also
quantified by the van't Hoff coefficients, we see that the Hamiltonian
without guiding gives a phase transition which is sharper by a factor
of about 3 for both cold and hot unfolding transitions. Also, in terms
of temperature, these transitions are much more separated than in real
systems. The present model as it stands is not able to account for this.
In Fig.
4 we
investigate systematically the van't Hoff coefficient
as function
of
p and
w for the
hot (Fig. 4 a) and the cold (Fig. 4 b)
transition. As is evident,
is similar but somewhat larger for the
hot than for the cold transition. As a consequence, the cold transition
transition is slightly sharper. We are not aware of any explicit
experimental measurements of the van't Hoff coefficient for the cold
transition, but Privalov et al. (1986)
indicate a sharp unfolding of
metmyoglobin at the the cold transition. Such a measurement will in
practice be hampered as the cold transition is mainly seen
experimentally at pH values where it is close to the hot transition.

View larger version (15K):
[in this window]
[in a new window]
|
FIGURE 3
Heat capacity curves for a system N = 50 with and without guiding, i.e., with p = w = 1 respectively p = w = 0. The parameters for the water variables are
min = 3.1, = 0.04, and
g = 350.
|
|

View larger version (56K):
[in this window]
[in a new window]
|
FIGURE 4
van't Hoff coefficient for hot (a)
and cold (b) transition for N = 100 system. The other parameters are as in Fig. 3.
|
|
Finally, we note the distinct feature of the cold transition
when
(
p,
w)
(1,0)
where it drops to a value below 4. This artifact incidently is due to a
merging of two neighboring cold transitions, as it can be shown that
can not be smaller than 4 for a single transition.
We summarize by noting that in this protein model, it is easy to
reconcile the thermodynamics of a two-state system with the dynamics of
a guided system, as this can be done by diminishing
p and/or
w from the
value one. The dynamical consequence of the hereby masked guiding is a
folding time that is dramatically reduced when the temperature is
moving below the transition temperature.
We note as final consequence of our model that good folders can be
viewed as random sequences of folding steps of which the last steps
have a particularly favorable binding energy thereby securing two state cooperativity.
A. H. and K. S. thank F. A. Oliveira and
H. N. Nazareno for warm hospitality and the International Center
for Condensed Matter Physics for support during our stay in Brazil. We
thank G. Zocchi for countless discussions. A. B. thanks the
Norwegian Research Council for financial support.
Address reprint requests to Audun Bakk, Norwegian University of Science
and Technology, NTNU, N-7491 Trondheim, Norway. Tel.: 47-73-59-36-98;
Fax: 47-73-59-33-72; E-mail: Audun.Bakk{at}phys.ntnu.no.
A. Hansen's permanent address: Department of Physics, Norwegian
University of Science and Technology, NTNU, N-7491 Trondheim, Norway.
K. Sneppen's permanent address: NORDITA, Blegdamsvej 17, DK-2100
Copenhagen, Denmark.