Originally published as Biophys J. BioFAST on March 13, 2006.
doi:10.1529/biophysj.105.075937
Biophysical Journal 90:3827-3841 (2006)
© 2006 The Biophysical Society
Mapping the Energy Landscape of Biomolecules Using Single Molecule Force Correlation Spectroscopy: Theory and Applications
V. Barsegov *,
D. K. Klimov
and
D. Thirumalai *
* Biophysics Program, Institute for Physical Science and Technology,
Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland; and
Bioinformatics and Computational Biology Program, School of Computational Sciences, George Mason University, Manassas, Virginia
Correspondence: Address reprint requests to D. Thirumalai, Tel.: 301-405-4303; E-mail: thirum{at}glue.umd.edu.
 |
ABSTRACT
|
|---|
We present, to our knowledge, a new theory that takes internal dynamics of proteins into account to describe forced-unfolding and force-quench refolding in single molecule experiments. In the current experimental setup (using either atomic force microscopy or laser optical tweezers) the distribution of unfolding times, P(t), is measured by applying a constant stretching force fS from which the apparent fS-dependent unfolding rate is obtained. To describe the complexity of the underlying energy landscape requires additional probes that can incorporate the dynamics of tension propagation and relaxation of the polypeptide chain upon force quench. We introduce a theory of force correlation spectroscopy to map the parameters of the energy landscape of proteins. In force correlation spectroscopy, the joint distribution P(T, t) of folding and unfolding times is constructed by repeated application of cycles of stretching at constant fS separated by release periods T during which the force is quenched to fQ < fS. During the release period, the protein can collapse to a manifold of compact states or refold. We show that P(T, t) at various fS and fQ values can be used to resolve the kinetics of unfolding as well as formation of native contacts. We also present methods to extract the parameters of the energy landscape using chain extension as the reaction coordinate and P(T, t). The theory and a wormlike chain model for the unfolded states allows us to obtain the persistence length lp and the fQ-dependent relaxation time, giving us an estimate of collapse timescale at the single molecular level, in the coil states of the polypeptide chain. Thus, a more complete description of landscape of protein native interactions can be mapped out if unfolding time data are collected at several values of fS and fQ. We illustrate the utility of the proposed formalism by analyzing simulations of unfolding-refolding trajectories of a coarse-grained protein (S1) with ß-sheet architecture for several values of fS, T, and fQ = 0. The simulations of stretch-relax trajectories are used to map many of the parameters that characterize the energy landscape of S1.
 |
INTRODUCTION
|
|---|
Several biological functions are triggered by mechanical force. These include stretching and contraction of muscle proteins such as titin (1
,2
), rolling and tethering of cell adhesion molecules (3
6
), V. Barsegov, D. Klimov, and D. Thirumalai, unpublished), translocation of proteins across membranes (7
10
), and unfoldase activity of chaperonins and proteasomes. Understanding these diverse functions requires our probing the response of biomolecules to applied external tension. Dynamical responses to mechanical force can be used to characterize in detail the free energy landscape of biomolecules. Advances in manipulating micron-sized beads attached to single biomolecules have made it possible to stretch, twist, unfold, and even unbind proteins using forces on the order of tens of picoNewtons (11
13
). Single molecule force spectroscopy on a number of different systems has allowed us to obtain a glimpse of the unbinding energy landscape of biomolecules and protein-protein complexes (14
17
). In atomic force microscopy (AFM) experiments, used to unfold proteins by force, one end of a protein is adsorbed on a template and a constant or a time-dependent pulling force is applied to the other terminus (18
24
). By measuring the distribution of forces required to completely unfold proteins and the associated unfolding times, the global parameters of the protein energy landscape can be estimated (25
30
). These insightful experiments when combined with theoretical studies (31
33
) can give an unprecedented picture of forced-unfolding pathways.
Current experiments have been designed primarily to obtain information on forced-unfolding of proteins and do not probe the reverse folding process. Although force-clamp AFM techniques have been used recently to probe (re)folding of single ubiquitin polyprotein (23
), the lack of theoretical approaches has made it difficult to interpret these pioneering experiments (34
,35
). Secondly, the resolution of multiple timescales in protein folding and refolding requires not only novel experimental tools for single molecule experiments but also new theoretical analysis methods. Minimally, unfolding of proteins by a stretching force, fS, is described by the global unfolding time
U(fS), timescales for propagation of the applied tension, and the dynamics describing the intermediates or protein-coil states. Finally, if the external conditions (loading rate or the magnitude of fS) are such that these processes can occur on similar timescales, then the analysis of the data requires new theoretical ideas.
For forced unfolding, the variable conjugate to fS, namely, the protein end-to-end distance X, is a natural reaction coordinate. However, X is not appropriate for describing protein refolding which, due to substantial variations in the duration of folding barrier crossing, may range from milliseconds to few minutes. To obtain statistically meaningful distributions of unfolding times, a large number of complete unfolding trajectories must be recorded, requiring repeated application of the pulling force. The inherent heterogeneity in the duration of folding and the lack of correlation between evolution of X and (re)folding progress creates initial state ambiguity when force is repeatedly applied to the same molecule. As a result, the interpretation of unfolding time data is complicated, especially when the conditions are such that the reverse folding process at the quenched force fQ can occur on a long timescale,
F(fQ).
Motivated by the need to assess the effect of the multiple timescales on the energy landscape of folding and unfolding, we develop a new theoretical formalism to describe correlations between the various dynamical processes. Our theory leads naturally to a new class of single molecule force experiments, namely, the force correlation spectroscopy (FCS), which can be used to study both forced unfolding as well as force-quenched (re)folding. Such studies can lead to more detailed information on both kinetic and dynamic events underlying unfolding and refolding. In the FCS, cycles of stretching (fS) are separated by periods T of quenched force fQ < fS, during which the stretched protein can relax from its unfolded state XU to coil state XC or even (re)fold to the native basin of attraction (NBA) state. The two experimental observables are X and the unfolding time t. The central quantity in the FCS is the distribution of unfolding times P(T, t) separated by recoil or refolding events of duration T. The higher order statistical measure embedded in P(T, t) is readily accessible by constructing a histogram of unfolding times for varying T and does not require additional technical developments. The crucial element in the proposed analysis is that P(T, t) is computed by averaging over final (unfolded) states, rather than initial (folded) states. This procedure removes the potential ambiguity of not precisely knowing the initial distribution of conformations in the NBA. Despite the uniqueness of the native state there are a number of conformations in the NBA that reflect the fluctuations of the folded state. The proposed formalism is a natural extension of unbinding-time data analysis. Indeed, P(T, t) reduces to the standard distribution of unfolding times P(t) when T exceeds protein (re)folding timescale
F(fQ).
The complexity of the energy landscape of proteins demands FCS and the theoretical analysis. Current single molecule experiments on poly-Ub or poly-Ig27 (performed in the T
regime) show that in these systems unfolding occurs abruptly in an apparent all-or-none manner or through a dominant intermediate (31
). On the other hand, refolding upon force-quench is complex, and surely occurs though an ensemble of collapsed coiled states (23
). A number of timescales characterize the stretch-release experiments. These include besides
F(fQ), the fS-dependent unfolding time, and the relaxation dynamics in the coiled states {C} upon force-quench
d(fQ). In addition, if we assume that X is an appropriate reaction coordinate, then the location of the NBA, {C}, the transition state ensembles, and the associated widths are required for a complete characterization of the underlying energy landscape. Most of these parameters can be extracted using the proposed FCS experiments and the theoretical analysis presented here.
In a preliminary study (36
), we reported the basics of the theory used to propose a new class of single molecule force spectroscopy methods for deciphering protein-protein interactions. This article is devoted to further developments in the theory, with application to forced-unfolding and force-quench refolding of proteins. In particular, we illustrate the efficacy of the FCS by analyzing single unfolding-refolding trajectories generated for a coarse-grained model (CGM) protein S1 with ß-sheet architecture (37
,38
). We showed previously that forced-unraveling of S1, in the limit of T
, can be described by an apparent two-state kinetics (38
,39
). The thermodynamics and kinetics observed in S1 is a characteristic of a number of proteins where folding/unfolding fits well two-state behavior (40
). Thus, S1 serves as a useful model to illustrate the efficacy of the FCS. Here, we show that by varying T and the magnitude of the stretching (fS or fQ), the entire dynamical processes, starting from the NBA to the fully stretched state, can be resolved. In the process we establish that P(T, t), which can be measured using AFM or laser optical tweezer (LOT) experiments, provides a convenient way of characterizing the energy landscape of biomolecules in detail.
 |
MODELS AND METHODS
|
|---|
Theory of force correlation spectroscopy (FCS)
In single-molecule atomic force microscopy (AFM) experiments used to unfold proteins by force, the N-terminus of a protein is anchored at the surface and the C-terminus is attached to the cantilever tip through a polymer linker. The molecule is stretched by displacing the cantilever tip and the resulting force is measured. From a theoretical perspective it is more convenient to envision applying a constant stretching force fS = fSx in the x-direction (Fig. 1). The free energy in the constant force formulation is related to the experimental setup by a Legendre transformation. More recently, it has become possible to apply a constant force in AFM or laser or optical tweezer (LOT) experiments to the ends of a protein. With this setup the unfolding time for the end-to-end distance X to reach the contour length L can be measured for each molecule. For a fixed fS, repeated application of the pulling force results in a single trajectory of unfolding times (t1, t2, t3, ..., Fig. 1) from which the histogram of unfolding times P(t) is obtained. The fS-dependent unfolding rate KU is obtained by fitting a Poissonian formula
exp [KUt] to the kinetics of population of folded states pF, which is related to P(t) as
.

View larger version (17K):
[in this window]
[in a new window]
|
FIGURE 1 (Top) A typical AFM setup: constant force f = fS = fSx is applied through the cantilever tip linker in the direction x parallel to the protein end-to-end vector X. Stretching cycles are interrupted by relaxation intervals T during which the force is quenched, f = fQ = fQx (fS > fQ). (Bottom) A single trajectory of forced unfolding times t1, t2, t3, ..., separated by fixed relaxation time T, during which the unfolded protein can either collapse into the manifold of coiled states {C} if T is short or reach the native basin of attraction (NBA) if T is long.
|
|
Because KU is a convolution of several microscopic processes, it does not describe unfolding in molecular detail. For instance, mechanical unfolding of fibronectin domains FnIII involves the intermediate aligned state (26
) with partially disrupted hydrophobic core, which cannot be resolved by knowing only KU. Even when the transition from the folded state F to the globally extended state U (26
) does not involve parallel routes as in Fig. 2, or multistate kinetics, the force-induced unfolding pathway must involve formation of intermediate coiled states {C}. The subsequent transition from {C} results in the formation of the globally unfolded state U. The incomplete time resolution prevents current experiments from probing the signature of the collapsed states. To probe the contributions from the underlying {C} states to global unfolding requires sophisticated experiments that can resolve contributions from dynamic events underlying forced unfolding. We propose a novel experimental procedure which, when supplemented with unfolding-time data analysis described below, allows us to separately probe the kinetics of native interactions and the dynamics of the protein coil (i.e., the dynamics of end-to-end distance X when the native contacts are disrupted).
Consider an experiment in which stretching cycles (triggered by applying fS) are interrupted by relaxation intervals T during which force is quenched to fQ < fS. In the time interval T, the polypeptide chain can relax into the manifold {C} or even refold to the native state F if T is long enough. If fS > fC and fQ < fC where fC is the equilibrium critical unfolding force at the specific temperature (see phase diagram for S1 in (38
)), these transformations can be controlled by T. In the simplest implementation, we set fQ = 0. The crucial element in the FCS experiment is that the same measurements are repeated for varying T. In the FCS the unfolding times are binned to obtain the joint histogram P(T, t) of unfolding events of duration t generated from the recoil manifold {C} or the native basin of attraction (NBA) or both, depending on the duration of the relaxation time T. In the current experiments, T
. As a result, the dynamics of additional states in the energy landscape that are explored during folding or unfolding are not probed.
The advantages of P(T, t) over the standard distribution of unfolding times P(t) are twofold. First, P(T, t) is computed by averaging over well-characterized fully stretched states. This eliminates the problem of not knowing the distribution of initial protein states encountered in current experiments. Indeed, due to intrinsic heterogeneity of the protein folding pathways, after the first unfolding event the protein may or may not refold into the native conformation, which creates the initial state ambiguity in the next (second, third, etc.) pulling cycle. Therefore, statistical analysis based on averaging over final (stretched) states rather than initial (folded) states allows us to overcome this difficulty. Secondly, statistical analysis of unfolding data performed for different values of T allows us to separately probe the kinetics of native interactions and the dynamics of X. In addition, the entire energy landscape of native interactions can be mapped out when stretch-quench cycles are repeated for several values of fS, fQ, and T.
Regime I (T <<
F)
In the simplest unfolding scenario, application of fS results in the disruption of the native contacts (F
{C}) followed by stretching of the manifold {C} into U (Fig. 2). When stretching cycles are separated by short T compared to the protein folding timescale
F at fQ = 0, P(T;t) is determined by the evolution of the coil state. Then the unfolded state population pU(T;t) is given by the convolution of protein relaxation (over time T) from the fully stretched state XU
L, to an intermediate coiled-state X1, and stretching X1 into final state Xf over time t. Thus, P(T;t) is obtained from pU(T;t) by taking the derivative with respect to t,
 | (1) |
where N(T) is the T-dependent normalization constant obtained by taking the last integral in the right-hand side of Eq. 1 from Xf = 0 to Xf = L, and P(XU) is the distribution of unfolded states. If X is well controlled, XU is expected to be centered around a fixed value
and
. In Eq. 1, GQ(X', t;X) and GS(X', t;X) are, respectively, the quenched and the stretching force-dependent conditional probabilities to be in the coiled state X' at time t arriving from state X at time t = 0. The integral over Xf is performed in the range [L
;L], with X = L
(Fig. 2) representing unfolding distance at which the total number of native contacts Q is at the unfolding threshold, Q
Q*. It follows that P(T;t) (Eq. 1) contains information on the dynamics of X. By assuming a model for X and fitting P(T;t), obtained by differentiating the integral expression appearing in Eq. 1, to the histogram of unfolding times, separated by short T <<
F, we can resolve the dynamics of the polypeptide chain in the coil state, which allows us to evaluate the fQ-dependent coil dynamical timescale
d using single-molecule force spectroscopy. The fit of Eq. 1 could be analytical or numerical depending on the model of X.
Regime II (T >>
F)
When stretching cycles are interrupted by long relaxation periods, T >>
F, the coiled states refold to XF (Fig. 2). In this regime, the initial conformations in forced-unfolding always reside in the NBA. In this limit, P(T;t) reduces to the standard distribution of unfolding times P(T, t)
P(t). When T >>
F, P(T;t) is given by the convolution of the kinetics of rupture of native contacts, resulting in protein extension
XF, and dynamics of X from state XF +
XF to final state Xf,
 | (2) |
where N'(T) is the normalization constant obtained as in Eq. 1, and PF(t, XF;fS) is the probability of breaking the contacts over time t that stabilize the native state XF. By assuming a model for PF(t, XF;fS) and employing information on the dynamics of X, obtained from the short T-experiment (Eq. 1), we can probe the disruption kinetics of native interactions. By repeating long T-measurements at several values of fS, we can map out the energy landscape of native interactions projected on the direction of the end-to-end distance vector.
Regime III (T
F)
In this limit, some of the molecules reach the NBA, starting from extended states (X
L), whereas others remain in the basin {C}. The fraction of folding events
F depends on T, during which X approaches the average extension
XC
facilitating the formation of native contacts. Thus, P(T
F), obtained in the intermediate T-experiment, involves contributions from both {C} and F initial conditions and is given by a superposition,
 | (3) |
where the probability to arrive to F from {C} at time T is given by
 | (4) |
and the probability to remain in {C} is
C(T) = 1
F(T). In Eq. 4, PC(T, X;fQ) is the refolding probability determined by the kinetics of formation of native contacts. Because the dynamics of X is weakly correlated with formation of native contacts, X in PC is expected to be broadly distributed. Therefore, Eqs. 3 and 4 can be used to probe kinetics of formation of native interactions.
For Eqs. 1 and 2 to be of use, one needs to know the (re)folding timescale
F. The simplest way to evaluate
F is to construct a series of histograms P(Tn, t) (n = 1, 2, ..., N) for a fixed fS and increasing relaxation time T1 < T2 < ... < TN, and compare P(Tn, t) values with the distribution P(T*, t) obtained for sufficiently long T* >>
F. If T = T*, then all the molecules are guaranteed to reach the NBA. The difference
 | (5) |
is expected to be nonzero for Tn
F and should vanish if Tn exceeds
F. Statistically, as Tn starts to exceed
F, increasingly more molecules will reach the NBA by forming native contacts. Then, more unfolding trajectories will start from folded states, and when T >>
F all unfolding events will originate from the NBA. Therefore, D(Tn) is a sensitive measure for identifying the kinetic signatures for forming native contacts. The utility of D(Tn) is that it is a simple yet accurate estimator of
F, which can be utilized in practical applications. Indeed, one can estimate
F by identifying it with the shortest Tn at which P(Tn;t)
P(T*, t), i.e., Tn
F. We should emphasize that to obtain
F from the criterion that D(
F)
0 no assumptions about the distribution of refolding times have been made. Having evaluated
F one can then use Eqs. 1 and 2 for short and long T-measurements to resolve protein coil dynamics and rupture kinetics of native contacts.
Let us summarize the major steps in the FCS. First, we estimate
F by using D(T) (Eq. 5). We next probe protein coil dynamics by analyzing P(T <<
F; t) obtained from short-T-measurements (Eq. 1). In the third step, we use information on protein coil dynamics to resolve the kinetics of rupture of native interactions contained in P(T >>
F; t) of long-T-measurements (Eq. 2). Finally, by employing the information on protein coil dynamics and kinetics of rupture of native interactions, we resolve the kinetics of formation of native contacts by analyzing P(T
F;t) from intermediate T-measurements (Eqs. 3 and 4).
The beauty of the proposed framework is that these experiments can be readily performed using available technology. In the current AFM experiments, T can be made as short as a few microseconds. Simple calculations show that the relaxation of a short 50-amino-acid protein from the stretched state, with L
19 nm, to the coiled states {C}, with, say, X
2 nm, occurs on the timescale
d
x2/D
10 µs, where
x = L X
17 nm and D
107 cm2/s is the diffusion constant. Clearly, the time of formation of native contacts, which drives the transition from {C} to the NBA, prolongs
F by a few microseconds to a few milliseconds or larger, depending on folding conditions. In the experimental studies of forced unfolding and force-quenched refolding of ubiquitin,
F was found to be of the order of 10100 ms (23
). Computer simulation studies of unzipping-rezipping transitions in short 22-nt RNA hairpin P5GA have predicted that
F is of the order of a few hundreds of microseconds (34
).
Model for the kinetics of native contacts
To interpret the data generated by FCS it is useful to have a model for the time evolution of the native contacts and X. We first present a simple kinetic model for rupture and formation of native contacts represented by probabilities PF and PC in Eqs. 2 and 4, respectively, and a model for the dynamics of X given by the propagator GS, Q(X', t;X). To describe the force-dependent evolution of native interactions we adopt the continuous-time-random-walk (CTRW) formalism (41
45
). In the CTRW model, a random walker, representing rupture (formation) of native contacts, pauses in the native (coiled) state for a time t before making a transition to the coiled (native) state. The waiting-time distribution is given by the function 
(t) (
= r or f, where r and f refer to rupture and formation of native contacts, respectively). We assume that the probabilities PF(t, XF;fS) and PC(t, XC;fQ) are separable so that
 | (6) |
where Peq(XF) is the equilibrium distribution of native states, PC(XC) is the distribution of coiled states, and Pr(t;fS) and Pf(t;fQ) are the force-dependent probabilities of rupture and formation of native contacts, respectively. Factorization in Eq. 6 implies that application of force does not result in the redistribution of states XF and XC in the NBA and in the manifold of coiled states {C}, but only changes the timescales for NBA
{C} and {C}
NBA transitions, and thus, the probabilities Pr and Pf. We expect the approximation in Eq. 6 to be valid provided the rupture of native contacts and refolding events are cooperative.
During stretching cycles, for fS well above fC, we may neglect the reverse folding process. Similarly, global unfolding is negligible during relaxation periods with fQ < fC. Then, the master equations for Pr(t) is
 | (7) |
where
r(t) is the generalized rate for the rupture and formation of native interactions. In the Laplace domain, defined by
,
r(t) is related to
r(t) as
 | (8) |
The structure of the master equation for Pf(t) is identical to Eq. 7, with the relationship between
f(t) and
f(t) being similar to Eq. 8. The general solution to Eq. 7 is
 | (9) |
where
is the initial condition and the solution in the time domain is given by the inverse Laplace transform,
. The solution for
is obtained in a similar fashion (see Eq. 9) with initial condition of
.
Model for the polypeptide chain
In the extended state, when the majority of native interactions that stabilize the folded state are disrupted, the molecule can be treated roughly as a fluctuating coil. Simulations and analysis of native structures (46
) suggest that proteins behave as wormlike chains (WLCs). For convenience we use a continuous WLC description for the coil state whose Hamiltonian is
 | (10) |
where lp is the protein coil persistence length. A large number of force-extension curves obtained using mechanical unfolding experiments in proteins, DNA, and RNA have been analyzed using a WLC model. In Eq. 10, the three-dimensional Cartesian vector r(s, t) represents the spatial location of the sth protein monomer at time t. The first two terms describe chain connectivity and bending energy, respectively. The third term represents fluctuations of the chain free ends and the fourth term corresponds to coupling of r to fS, Q. The end-to-end vector is computed as X(t) = r(L/2, t) r( L/2, t).
We need a dynamical model in which X is represented by the propagator G(X, t;X0). Although bond vectors of a WLC are correlated, the statistics of X can be represented by a large number of independent modes. It is therefore reasonable, at least in the large L limit, to describe GS, Q(X, t;X0) by a Gaussian,
 | (11) |
specified by the second moment
X2
S, Q and the normalized correlation function
(t)S, Q =
X(t)X(0)
S, Q/
X2
S, Q. Calculations of
X2
S, Q and
(t)S, Q are given in the Appendix (46
,47
). In the absence of force, we obtain
 | (12) |
where
n(X) and zn are the eigenfunctions and eigenvalues of the modes of the operator that describe the dynamics of r(s, t) (see Eq. A1). To construct the propagator GS, Q(X, t;X0) for fS, Q, Eq. A1 is integrated with fS, Q added to random force. We obtain
, where n = 1, 3, ..., 2q + 1. We analyze the distributions of unfolding times P(T, t) for the model sequence S1 (Fig. 3) obtained using simulations, the CTRW model for evolution of native interactions (Eqs. 69), and Gaussian statistics of the protein coil (Eq. 11).

View larger version (30K):
[in this window]
[in a new window]
|
FIGURE 3 Native structure of the model protein S1. The model polypeptide chain has a ß-sheet architecture of the native state. The ß-strands of the model chain are formed by native contacts between hydrophobic residues (given by blue spheres). The hydrophilic residues are shown (red spheres) and the residues forming the turn regions are given (gray spheres).
|
|
Simulations of model ß-sheet protein
The usefulness of FCS is illustrated by computing and analyzing the distribution function P(T;t) for a model polypeptide chain with ß-sheet architecture. Sequence S1, which is a variant of an off-lattice model introduced sometime ago (37
), is a coarse-grained model (CGM) of a polypeptide chain, in which each amino acid is substituted with a united atom of appropriate mass and diameter at the position of the C
-carbons (38
,39
). The S1 sequence is modeled as a chain of 46 connected beads of three typeshydrophobic B, hydrophilic L, and neutral Nwith the contour length L = 46a, where a
3.8 Å is the distance between two consecutive C
-carbon atoms. The coordinate of the jth residue is given by the vector xj with j = 1, 2, ..., N.
The potential energy U of a chain conformation is
 | (13) |
where Ubond, Ubend, and Uda are the energy terms, which determine local protein structure, and Unb corresponds to nonlocal (nonbonded) interactions. The bond-length potential Ubond, which describes the chain connectivity, is given by a harmonic function
 | (14) |
where kb = 100
h/a2 and
h (
1.25 kcal/mol) is the energy unit roughly equal to the free energy of a hydrophobic contact. The bending potential Ubend is
 | (15) |
where k
= 20
h/rad2 and
0 = 105°. The dihedral angle potential Uda, which is largely responsible for maintaining proteinlike secondary structure, is taken to be
 | (16) |
where the coefficients Ai and Bi are sequence-dependent. Along the ß-strands, trans-states are preferred and A = B = 1.2Eh. In the turn regions (i.e., in the vicinity of a cluster of N residues), A = 0, B = 0.2Eh. The nonbonded 12-6 Lennard-Jones interaction Unb between hydrophobic residues is the sum of pairwise energies
 | (17) |
where Uij depends on the nature of the residues. The double summation in Eq. 17 runs over all possible pairs excluding the nearest-neighbor residues. The potential
between a pair of hydrophobic residues B is given by
, where
is a random factor unique for each pair of B residues (39
) and r = |xi xj|. For all other pairs of residues Uij
ß is repulsive (39
).
Although an off-lattice CGM drastically simplifies the polypeptide chain structure, it does retain important characteristics of proteins, such as chain connectivity and the heterogeneity of contact interactions. The local energy terms in S1 provide accurate representation of the protein topology. The native structure of S1 is a ß-sheet protein that has a topology similar to the much-studied immunoglobulin domains (Fig. 3). When the model sequence is subject to fS or fQ, the total energy is written as Utot = U f
X (
= S or Q), where X is the protein end-to-end vector, and fS, Q = (fS,Q, 0, 0) is applied along the x-direction (Fig. 1).
The dynamics of the polypeptide chain is assumed to be given by the overdamped Langevin equationwhich, in the absence of fS or fQ, is
 | (18) |
where
is the friction coefficient and gj(t) is a Gaussian white noise, with the statistics
 | (19) |
Equation 18 is integrated with a step size
t = 0.02
L, where
L = (ma2/
h)1/2 = 3 ps is the unit of time and m
3 x 1022g is a residue mass. In Eq. 18, the value of
= 50 m/
L corresponds roughly to water viscosity.
 |
RESULTS
|
|---|
Simulations of unfolding and refolding of S1
For the model sequence S1 we have previously shown that the equilibrium critical unfolding force is fC
22.6 pN (38
) at the temperature Ts = 0.692
h/kB below the folding transition temperature TF = 0.7
h/kB. At this temperature 70% of native contacts are formed (see the phase diagram in (38
)). To simulate the stretch-relax trajectories, the initially folded structures in the NBA were equilibrated for 60 ns at Ts. To probe forced unfolding of S1 at T = Ts, constant pulling force fS = 40 pN and 80 pN was applied to both terminals of S1. For these values of fS, S1 globally unfolds in t = 90 ps and 50 ps, respectively. Cycles of stretching were interrupted by relaxation intervals during which the force is abruptly quenched to fQ = 0 for various durations of T. Unfolding-refolding trajectories of S1 have been recorded as a time-series of X and the number of native contacts Q.
In Fig. 4 we present a single unfolding-refolding trajectory of X and Q of S1, generated by stretch-relax cycles. Stretching cycles of constant force fS = 80 pN applied for 30 ns are interrupted by periods of quenched force relaxed over 90 ns. A folding event is registered if it results in the formation of 92% of the total number of native contacts QF = 106, i.e., Q
0.92 QF for the first time. An unfolding time is defined as the time of rupture of 92% of all possible contacts for the first time. With this definition, the unfolded state end-to-end distance is X
XU
36a. In Fig. 4, folded (unfolded) states correspond to minimal (maximal) X and maximal (minimal) Q. Inspection of Fig. 4 shows that refolding events are essentially stochastic. Out of 36 relaxation periods only nine attempts resulted in refolding of S1. Both X and Q show that refolding of S1 occurs though an initial collapse to a coiled state with the end-to-end distance XC/a
15 (Q
20), followed by the establishment of additional native contacts (Q
90) stabilizing the folded state with XF/a
(1
2
).

View larger version (28K):
[in this window]
[in a new window]
|
FIGURE 4 A single unfolding-refolding trajectory of the end-to-end distance X/a (solid lines) and the total number of native contacts Q (dotted lines) as a function of time t for S1. The trajectory is obtained by repeated application of stretch-quench cycles with stretching force fS = 80 pN and quenched force fQ = 0. The duration of stretching cycle and relaxation period is 30 ns and 90 ns, respectively. The first five unfolding events corresponding to large X/a and small Q are marked explicitly by numbers 1, 2, 3, 4, and 5. Force-stretch and force-quench for the stretch-quench cycles 13, 14, 15, 16, and 17 (middle panel) are denoted by solid and dash-dotted arrows.
|
|
We generated
1200 single unfolding-refolding trajectories and monitored the time-dependent behavior of X and Q. In the first set of simulations we set fS = 40 pN and used several values of T = 24, 54, 102, 150, and 240 ns. In the second set, fS = 80 pN, and T = 15, 48, 86, 120, and 180 ns. Each trajectory involves four stretching cycles separated by three relaxation intervals in which fQ = 0. Typical unfolding-refolding trajectories of X and Q for fS = 40 pN, fQ = 0, and T = 102, 150, and 240 ns are displayed in Figs. 57
, respectively. Due to finite duration of stretching cycles (90 ns), unfolding of S1 failed in few caseswhich were not included in the subsequent analysis of unfolding times. Only first stretching cycles in each trajectory are guaranteed to start from the NBA, and for T = 102 ns (Fig. 5), relatively few relaxation intervals result in refolding (with large Q). This implies that the distribution of unfolding times P(T, t) obtained from these trajectories are dominated by contributions from the coiled states, with the kinetics of formation of the native contacts playing only a minor role. Not unexpectedly, refolding events are more frequent when T is increased to 150 ns and 240 ns. At T = 150 ns, Q reaches higher values (
6575) and the failure to refold is rare (Fig. 6). This implies that as T starts to exceed the (re)folding time
F, the distribution of unfolding events, parameterized by P(T, t), is characterized by diminishing contribution from the coiled states {C} and is increasingly dominated by the folded conformations in the NBA. Note that failed refolding events are observed even at T = 240 ns (Fig. 7), which implies large heterogeneity in the duration of folding barrier crossing events. Figs. 57
suggest that the folding time
F at the temperature TU is in the range 100240 ns. Direct computations of the folding time
F from hundreds of folding trajectories starting with the fully stretched states gives
The agreement between
and
F validates our stretch-release simulations.

View larger version (26K):
[in this window]
[in a new window]
|
FIGURE 5 Typical unfolding-refolding trajectories of X/a (solid lines) and Q (dotted lines) for S1 as functions of time t, simulated by applying four stretch-quench cycles at the pulling force fS = 40 pN and quenched force fQ = 0. The duration of relaxation time T = 102 ns.
|
|

View larger version (34K):
[in this window]
[in a new window]
|
FIGURE 6 Examples of unfolding-refolding trajectories of X/a (solid lines) and Q (dotted lines) for S1 as a function of time t. The pulling force is fS = 40 pN and the quenched force is fQ = 0. The duration of relaxation time T = 150 ns.
|
|
Analysis of the distribution of unfolding times of S1
The theoretical considerations in our formalism suggest that the T-dependent heterogeneous unfolding processes occur not only from the NBA but also from the intermediate coil {C} states. The T-dependent protein dynamics can be utilized to separately probe the coil dynamics of the polypeptide chain and the kinetics of formation/rupture of native contacts (Q). We now utilize unfolding-refolding trajectories of S1, simulated for short, intermediate and long T, to build the histograms of unfolding times P(T, t). Using P(T, t) we provide quantitative description of the polypeptide chain dynamics in the coil state and the kinetics of rupture and formation of native interactions by employing CTRW model for Q and Gaussian statistics for X.
We computed P(T, t) using the distribution of unfolding times obtained for fS = 80 pN, T = 15, 48, and 86 ns (Fig. 8), and fS = 40 pN, T = 24, 54, and 102 ns (Fig. 9). In both cases fQ = 0. We excluded unfolding times corresponding to the first stretch-quench cycle of each trajectory, which were used to construct P(t) for the purposes of comparing P(t) with P(T, t) for long T. Single peaked P(T, t) obtained for T = 15 ns (Fig. 8) and T = 24 ns (Fig. 9), represent contributions to S1 unfolding from coil manifold {C} alone. When T is increased to 48 ns (Fig. 8) and 54 ns (Fig. 9), position of the peak shifts to longer times, i.e., from t
2.5 ns to t
5 ns (Fig. 8) and from t
6 ns to t
10 ns (Fig. 9). Furthermore, P(T, t) develops a shoulder at t
10 ns and t
25 ns, observed for T = 86 ns (Fig. 8) and T = 102 ns (Fig. 9), which indicates a growing (with T) contribution to unfolding from relaxation trajectories that reach the NBA. At longer T = 150 ns, when most relaxation periods result in refolding of S1, contribution from coiled states diminishes and at T = 240 ns, P(T, t) is identical to the standard distribution P(t) constructed from unfolding times of the first stretch-quench cycle of each trajectory. This implies that for fQ = 0,
F
240 ns and that P(T, t)
P(t) for T > 240 ns. The distribution P(T, t) = P(t) constructed from unfolding times separated by T = 300 ns is presented in Figs. 8 and 9 (top left panel).

View larger version (32K):
[in this window]
[in a new window]
|
FIGURE 8 Histograms of forced unfolding times P(t) and the joint distributions of unfolding times separated by relaxation periods of the quenched force P(T, t). The distribution functions are constructed from single unfolding-refolding trajectories of S1 simulated in stretch-quench cycles of fS = 80 pN and fQ = 0 for T = 15 ns, 48 ns, and 86 ns. Simulated distributions are shown by shaded bars with the contribution to global unfolding events from coiled conformations {C} indicated by an arrow for T = 86 ns. The results of the numerical fits obtained by using Eqs. 14 are represented by solid lines. The energy landscape parameters of S1 are summarized in Table 1.
|
|

View larger version (33K):
[in this window]
[in a new window]
|
FIGURE 9 Histograms of forced unfolding times P(t) and P(T, t) constructed from single unfolding-refolding trajectories for S1. The stretch-quench cycles were simulated with fS = 40 pN and fQ = 0 for T = 24 ns, 54 ns, and 102 ns. Simulated distributions are shown by shaded bars with the contribution to global unfolding events from coiled conformations {C} indicated by an arrow for T = 102 ns. The results of numerical fit obtained by using Eqs. 14 are represented by solid lines. The values of the parameters are given in Table 1.
|
|
We use the CTRW formalism to analyze the histograms of unfolding times P(T, t) from which the parameters that characterize the energy landscape of S1 can be mapped. We describe the kinetics of rupture and formation of native contacts by the waiting-time distributions
r,
f,
 | (20) |
where kr (dependent on fS) and kf (dependent on fQ) are the rates of rupture and formation of native interactions, respectively, Nr, f = kr, f/
(vr, f) are normalization constants (
(x) is
-function), and vr, f
1 are phenomenological parameters quantifying the deviations of the kinetics from a Poissonian process. For instance, vr, f = 1 implies Poissonian process and corresponds to standard chemical kinetics with constant rate kr, f. We assume that both the folded and the unfolded states are sharply distributed around the mean native and unfolded end-to-end distance
XF
and
XU
, respectively (Fig. 2),
 | (21) |
where
XU
/a = 36 residues corresponds to the definition of unfolded state. For S1 the contour length L/a = 46. Thus, S1 is unfolded if X/a exceeds
XU
, which implies
/a = 10 residues (see Fig. 2 and the lower limit of integration in Eq. 1). We describe the distribution of states {C} before the transition to the NBA by a Gaussian,
 | (22) |
with the width,
XC, centered around the average distance,
XC
.
We performed numerical fits of the histograms presented in Figs. 8 and 9 using Eqs. 14. By fitting the theoretical curves to P(T, t) constructed from short T = 15 ns and T = 48 ns simulations (Fig. 8) and T = 24 ns and T = 54 ns (Fig. 9), we first studied the dynamics of X to estimate the dynamical timescale
d, i.e., the longest relaxation time corresponding to the smallest eigenvalue zn (Eq. 12), and persistence length lp of S1 in the coil states {C}. By using the values of
d and lp, we used our theory to describe P(T, t) constructed from long T = 300 ns simulations. This analysis allows us to estimate the parameters characterizing the rupture of native contacts kr, vr,
XF
, and
XF. Finally, the parameters kf, vf,
XC
, and
XC, characterizing formation of native contacts, were estimated using
d, lp, kr, vr,
XF
, and
XF, and fitting Eqs. 3 and 4 to P(T, t) for intermediate T = 86 ns (Fig. 8) and T = 102 ns (Fig. 9).
Extracting the energy landscape parameters of S1
There are a number of parameters that characterize the energy landscape and the dynamics of the major components in the NBA
U transition. The numerical values of the model parameters are summarized in Table 1. The values of vr = 6.9 for fS = 40 pN and vr = 5.1 for fS = 80 pN indicate that rupture of native contacts is highly cooperative, especially at the lower fS = 40 pN. This agrees with the previous findings on kinetics of forced unfolding of S1 (38
), which were based solely on unfolding S1 by applying a constant force. In contrast, the formation of native contacts is characterized by vf
1, implying an almost-Poissonian distribution for the kinetics of formation of native contacts. The structural characteristics of the coil states are obtained using the relaxation of the polypeptide chain upon force-quench from stretched states. The value of the persistence length lp, which should be independent of fQ provided fQ/fC << 1, is found to be
4.8 Å (Table 1). This value is in accord with the results of the recent experimental measurements based on kinetics of loop formation in denatured states of proteins (48
).
Upon rupture of native contacts, the chain extends by
XF/a = 6.4 (for fS = 40 pN) and
XF/a = 6.7 (for fS = 80 pN). This distance separates the basins of folded states with
XF
/a = 4.5 at fS = 40 pN and
XF
/a = 4.6 at fS = 80 pN from high free-energy states when the polypeptide chain is stretched in the direction of fS (Fig. 2 a). Because these high free-energy states are never populated, we expect that forced-unfolding of S1 must occur in an apparent two-step manner when T
. Explicit simulations of S1 unfolding at constant fS (
69 pN) shows that mechanical unfolding occurs in a single step (see Fig. 2 in (38
)).
From the refolding free-energy profile upon force-quench (see Fig. 2 b) we infer that the initial stretched conformation must collapse to an ensemble of compact structures {C}. From the analysis of P(T;t) using the CTRW formalism we find that the average end-to-end distance
XC
for the manifold {C} is close to
XF
(see Table 1), which suggests that the ensemble of the {C}
NBA transition states is close to the native state. There is a broad distribution of coiled states {C}, which is manifested in the large width
XC/a = 2.2. Due to the broad conformational distribution, there is substantial heterogeneity in the refolding pathways. This feature is reflected in the long tails in P(T, t) (see Figs. 8 and 9). As a result, we expect the kinetic transition to be sharp. The estimated timescale (
1/kf) for forming native contacts for S1 is shorter than the coil dynamical timescale
d (for the values of fS used in the simulations). This indicates that the dynamical collapse of S1 from the stretched state XU
L and equilibration in the coiled manifold {C} constitutes a significant fraction of the total folding time (
). From the analysis of folding of S1 (P(T;t) at intermediate T) we also infer that the transition state ensemble for {C}
N must be narrow.
From the rates of rupture of native contacts kr at the two fS values and assuming the Bell model for the dependence of kr on fS,
 | (23) |
we estimated the force-free rupture rate
and the critical extension
, at which folded states of S1 become unstable. We found that
is negligible compared to the rate of formation of native contacts, kf = 0.25 ns1. The location of the transition state of unfolding X =
XF
+
is characterized by
= 1.5 a
0.03 L. The value of
is short compared to
XC, which is a measure of the width of the {C} manifold. Small
implies that the major barrier to unfolding is close to the native conformation. A similar values of
was obtained in the previous study of S1 by using an entirely different approach (38
). These findings are consistent with AFM experiments (49
) and computer simulations (50
), which show that native structures of proteins appear to be brittle upon application of mechanical force.
The parameter
d is an approximate estimate of the collapse time,
c, from the stretched to the coiled state. Using direct simulations of the decay of the radius of gyration, Rg, starting from a rodlike conformation, we obtained
c
80 ns (see the Supplementary Information in (51
)). The value of
d (
20 ns) is in reasonable agreement with the estimate of
c. This exercise shows that reliable estimates of timescales of conformational dynamics, which are difficult to obtain, can be made using FCS. To ascertain the extent to which the estimate of KU agrees with independent calculations, we obtained the KU by applying a constant force to unfold S1. The value of KU, obtained by averaging over 200 trajectories, is
90 ns at fS = 40 pN, which is in rough accord with
This further validates the efficacy of FCS in obtaining the energy landscape of proteins. We also estimated
from the value of KU obtained by direct simulation and the Bell model. The fS-dependent unfolding rate
increases with fS in accord with Eq. 23. The prefactor (
) is
10-fold smaller than
The difference may be either due to the failure of the assumption that
or to the breakdown of the Bell model (52
).
 |
DISCUSSION
|
|---|
In this section we summarize the main steps for practical implementation of the proposed force correlation spectroscopy (FCS) to probe the energy landscape of proteins using forced unfolding of proteins.
Step 1: Evaluating the (re)folding timescale
F
In the first phase of the FCS experiments, one needs to collect a series of histograms P(Tn, t), n = 1, 2, ..., N of unfolding times for increasing relaxation time T1 < T2 < ... < TN by repeated stretch-release experiments. This can be done by discarding the first unfolding time t1 in the sequence of recorded unfolding times {t1, t2, ..., tM} for each Tn to guarantee that all the unfolding events are generated from the stretched states with the distribution P(XU) (see Eq. 1). This is a crucial element of the FCS methodology since it enables us to perform the averaging over the final (stretched) states. It is easier to resolve experimentally the end-to-end distance X
L, rather than the initial (folded) states in which a number of conformations belong to the NBA. The histograms are compared with P(T*, t) obtained for sufficiently long T* >>
F. To ensure that T* exceeds
F, T* can be as long as a few tens of minutes. The time at which D(Tn), given by Eq. 5, is equal to zero can then be used to estimate 