Originally published as Biophys J. BioFAST on June 9, 2006.
doi:10.1529/biophysj.106.086264
Biophysical Journal 91:1868-1886 (2006)
© 2006 The Biophysical Society
Toward an Accurate Theoretical Framework for Describing Ensembles for Proteins under Strongly Denaturing Conditions
Hoang T. Tran and
Rohit V. Pappu
Department of Biomedical Engineering and Center for Computational Biology, Washington University in St. Louis, St. Louis, Missouri
Correspondence: Address reprint requests to Rohit V. Pappu, Dept. of Biomedical Engineering and Center for Computational Biology, Washington University in St. Louis, St. Louis, MO 63130-4899. Tel.: 314-362-2057; Fax: 314-362-7183; E-mail: pappu{at}biomed.wustl.edu.
 |
ABSTRACT
|
|---|
Our focus is on an appropriate theoretical framework for describing highly denatured proteins. In high concentrations of denaturants, proteins behave like polymers in a good solvent and ensembles for denatured proteins can be modeled by ignoring all interactions except excluded volume (EV) effects. To assay conformational preferences of highly denatured proteins, we quantify a variety of properties for EV-limit ensembles of 23 two-state proteins. We find that modeled denatured proteins can be best described as follows. Average shapes are consistent with prolate ellipsoids. Ensembles are characterized by large correlated fluctuations. Sequence-specific conformational preferences are restricted to local length scales that span five to nine residues. Beyond local length scales, chain properties follow well-defined power laws that are expected for generic polymers in the EV limit. The average available volume is filled inefficiently, and cavities of all sizes are found within the interiors of denatured proteins. All properties characterized from simulated ensembles match predictions from rigorous field theories. We use our results to resolve between conflicting proposals for structure in ensembles for highly denatured states.
 |
INTRODUCTION
|
|---|
The development of an accurate theoretical framework for describing denatured-state ensembles of proteins has been a topic of long-standing interest (1
12
). Denatured states figure prominently in a variety of studies on proteins especially as reference states for estimating protein stability (4
,5
,13
18
). Accurate models for denatured states also impact a range of areas, including quantitative studies of in vitro folding pathways (10
,19
24
), protein design (25
,26
), studies of protein aggregation (27
29
), and understanding preferential interactions in cosolute mixtures (1
,3
,30
36
). Our focus in this work is on conformational ensembles accessible to proteins under strongly denaturing conditions. Theory and experiment make it unequivocally clear that the ensemble accessible under these harshly denaturing conditions need not bear resemblance to the nonnative states accessible to proteins under more physiological conditions. As will be discussed below, we expect our results to be valid for the strict limit of maximally denatured proteins. This limit is of interest in light of data that suggest the presence of residual structure even under strongly denaturing conditions.
As noted by Chan and Dill in their influential review (37
), theories drawn from the polymer physics literature (38
44
) are well-suited to describe heterogeneous conformational ensembles such as those of denatured states. For example, scaling of chain size with chain length can provide a direct probe of the nature of chain-solvent interactions (37
,38
,42
,44
,45
). Flory showed that a quantity such as the average radius of gyration (Rg) will scale with chain length (N) according to a power law of the form Rg = RoN
(45
). Values of Ro and
will vary with solution conditions. If
0.6, it means that a chain will swell to make favorable contacts with the surrounding solvent and the chain is in a good solvent. This is the case if at least one major component of the surrounding solvent is chemically equivalent to the main repeating unit of the polymer making chain-solvent contacts preferable to chain-chain contacts (4
,37
). Conversely, if
0.34 the chain is in a poor solvent and forms a compact globule by minimizing contacts with the surrounding solvent.
Proteins in high concentrations of denaturants, such as 8 M urea or 6 M GdnCl, behave like chains in good solvents (3
). This conclusion has been reached through quantitative studies of the scaling of hydrodynamic radii (46
) and radii of gyration (11
,47
) with chain length under harshly denaturing conditions. Wilkins et al. (46
) used pulse-field gradient NMR to quantify effective hydrodynamic radii for seven denatured proteins, the lengths of which varied from 16 to 247 residues. The hydrodynamic radii (Rh) for denatured proteins scale with chain length (N) as: Rh = 2.21N0.57. Recently, Kohn et al. (11
) used small-angle x-ray scattering (SAXS) to measure Rg as a function of N for 28 different chemically denatured proteins, with chain lengths varying from 8 to 549 residues. They showed that the scaling of Rg with N follows a power law of the form Rg = RoN0.598±0.028 with Ro = 1.983 ± 0.1. The data of Kohn et al. and those of Wilkins et al. are in general agreement with each other and reinforce Tanford's hypothesis (3
) that highly denatured proteins behave like chains in a good solvent.
A good solvent can also be a "perfect" solvent (38
). The latter refers to conditions under which the conformational ensemble can be modeled by ignoring all interactions except "two-body" repulsive (steric) interactions of the excluded volume (EV) kind. The idea is that in a perfect solvent, chain-solvent interactions exactly counterbalance all non-EV intrachain interactions (38
). Hence, the limit of a perfect solvent is also referred to as the EV limit (39
). The scaling exponent
0.59 and the value of the intercept Ro assumes its maximum possible value in the EV limit. As solvent quality deviates from that of a perfect solventtoward a good solventthe value of Ro will decrease without changing the scaling exponent,
.
In the EV limit, the N0.59 scaling law is obeyed by both short and long chains (39
). If the solvent is not a perfect solvent, it takes very long chains to realize the power law behavior for quantities such as Rg. The goodness of solvent can be assessed by comparing the measured scaling of chain size with chain length (N) to that obtained by assuming the EV limit. Of particular interest is the value of Ro, which is related to the persistence length and also provides a measure of the goodness of the solvent because
quantifies the average volume per residue set aside by the chain for interactions with the surrounding solvent (38
,41
).
Do harshly denaturing environments such as 8 M urea or 6 M GdnCl mimic perfect solvents?
In previous work, we developed a fast and accurate way to generate thermal, self-avoiding distributions for proteins with atomistic detail (48
). For the 28 proteins studied by Kohn et al. (11
), we obtained a scaling exponent of
= 0.62 ± 0.01 and Ro = 2.08 ± 0.02. Deviations from the accurate field-theoretic exponent of
= 0.5885 (49
) are mainly due to the finite lengths of the proteins we studied. With this caveat in mind, we assert that both the scaling exponent
and the intercept Ro calculated in the EV limit show statistically significant agreement with estimates from SAXS data (11
). It is also noteworthy that the observed scaling behavior is valid for a range of chain lengths that includes short chains (11
). The preceding discussion suggests that harshly denaturing (as opposed to mildly denaturing) conditions can be thought of as close mimics of "perfect", rather than just good, solvents (50
). The implication is that EV-limit ensembles for proteins are likely to be close facsimiles of conformational ensembles in high concentrations of chemical denaturants. Accordingly, the remainder of this work focuses on a detailed characterization of protein conformational distributions in the EV limit.
What is the appropriate theoretical framework for describing conformational ensembles of proteins in the EV limit?
Two very different theories have been advanced to explain how the scaling exponent of
0.59 comes about for polymers in the EV limit. The widely-known theory is that of Flory (44
). In this model, the polymer is treated as a cloud of uncorrelated monomers in a mean field. There are two terms in the expression for the mean-field free energy, which is parameterized in terms of Rg. The first term mimics the chain's drive to swell to maximize chain-solvent interactions. The second term provides an estimate of the conformational entropy, which opposes chain swelling. Minimization of the mean-field free energy with respect to Rg yields a power law with a scaling exponent of
= 0.6. This widely-cited result provides the theoretical basis for the assertion that denatured proteins are Flory-like random coils (3
,4
,11
,12
,17
,37
,30
). For reasons to be discussed below, this assertion is in fact inaccurate.
Modern polymer theories have established that the use of Flory's mean-field model is flawed when it comes to predicting detailed properties of conformational ensembles in the EV limit (38
,39
,41
,42
). In Flory's approach, a range of chain properties including Rg, the average end-to-end distance (Re), the hydrodynamic radius (Rh), the second virial coefficient (B2), and the osmotic pressure (
) are calculated as series expansions in terms of the parameter
(39
,40
,45
). Here, T is the desired temperature and
is the theta temperature, where polymers behave like ideal chains, and N is the chain length. It is assumed that the chain swells uniformly vis-à-vis its theta state. The use of theories based on perturbations around the
state is only valid in the limit T
or small N. For polymers in the EV limit, z
and Flory's model is not applicable in this regime. As a consequence, several special characteristics of conformational ensembles for polymers in the EV limitand, by extension, of highly denatured proteinsare not anticipated by Flory's theory. This observation is not new and several treatises on the subject are available in the polymer literature (39
42
).
Departures from Flory's random-coil model are based on field-theoretic approaches (39
42
) that explicitly account for the effects of correlations in a self-repelling chain. The goal in these theories is to explain why chain properties such as Rg, Re, Rh,
, B2, scattering functions, and internal correlations obey nontrivial power laws in the EV limit (39
). Interestingly, the scaling exponent
0.59 features prominently in all of these power laws. An important prediction of field theory is that all power laws are the result of correlations imparted by repulsive, steric (EV) interactions. The effects of these correlations are present on all length scales. Consequently, in the EV limit, a range of chain properties show so-called scale invariance. Simply stated, chain properties for long chains can be predicted by scaling the corresponding properties for short chains and vice versa. It is on the basis of this invariance to "spatial dilatations" (39
) that polymers in the EV limit are said to be renormalizable entities. The availability of an accurate theoretical framework for explaining scale-invariant properties of polymers in the EV limit has important ramifications for developing accurate theoretical descriptions for denatured proteins.
As for specific predictions, a polymer in the EV limit is best described in terms of two distinct length scales (39
). All sequence-specific effects are restricted to a single local length scale, denoted as ls. If one were to examine chain properties at length scales that go above ls, properties of denatured proteins for different sequences should become indistinguishable from each other. Scale invariance applies to a variety of chain properties that go beyond chain size (39
). In the EV limit, the average shape of the chain should be that of a prolate ellipsoid. Internal distances between residues that are beyond ls should show the same power law dependence on sequence separation as does Rg (or Re) on chain length (N). The average volume occupied by the chain will be filled inefficiently, and cavities of all sizes l > ls should be found readily within the interior of a denatured protein. Finally, the ensemble-averaged topology should be invariant with sequence or chain length and the chain is best described as a fractal object of dimension 1.7 (38
40
).
In this work, we show that we can simulate conformational ensembles, with atomistic detail, such that ensemble characteristics match the predictions for polymers in the EV limit. We demonstrate this by comparing static, equilibrium properties of the simulated ensembles to those predicted by rigorous field theories. The development of an accurate EV limit description for denatured proteins mirrors the use of the hard-sphere fluid as a reference state for van der Waals liquids (51
,52
).
Our presentation is organized as follows. First, we present a detailed description of the methods used in our work. Next, we describe six major results to show that characteristics of the simulated EV-limit ensembles are in accord with the predictions of field theories and hence inconsistent with Flory's random coil model. Finally, in the discussion section, we place our results in the context of ongoing debates regarding denatured-state ensembles.
 |
MATERIALS AND METHODS
|
|---|
Potential functions
In the EV limit only the effects of steric interactions are considered. Accordingly, interatomic interactions were modeled using purely repulsive, inverse power potentials (48
,53
). In our formalism, distinct conformations are specified by a unique set of backbone and side-chain torsion angles, viz.,
,
, and
. The inverse power potential energy (U) for a given conformation is a sum of pairwise interactions. The sum, which runs over all nonbonded pairs of atoms, is written as
 | (1) |
In Eq. 1,
ij is the hard-sphere contact distance (54
), rij is the interatomic separation, and the dispersion parameters
ij are determined by static polarizability values for individual atoms (55
,56
).
The values we use for
ij and
ij have been published previously (48
). The parameters were chosen to reproduce heats-of-fusion data for model compounds (55
). In our EV model, there is only one free parameter, namely the exponent n. For n
, the formula in Eq. 1 resembles the traditional hard-sphere potential (57
). For small n, we obtain softer repulsive potentials. A twofold advantage underlies our choice of soft-core repulsions. First, small values of n lend robustness in that our results do not become overly sensitive to the specific choices for values of
ij. Second, unlike hard-sphere potentials, which stipulate that all sterically allowed conformations are isoenergetic, soft-core potentials encode the requisite conformational specificity (48
,53
,58
). This has been demonstrated by the generation of quantitative conformational propensities for a range of peptide sequences (48
). In this work, we set n = 14. This choice is based on previous work (48
), where we showed that conformational propensities for a series of host-guest peptides are insensitive to the choice for n so long as it is in the range n = 925.
Degrees of freedom
Bond lengths and bond angles are fixed at equilibrium values taken from the work of Engh and Huber (59
). The peptide unit is always trans with
= 179.5°. The degrees of freedom in all of our calculations are the backbone
,
, and side-chain
-angles. All sequences are acetylated and N-methylamidated at the N- and C-termini, respectively. If the EV interactions, shown in Eq. 1, are turned off, we obtain Flory's freely rotating chain model (43
), albeit with a constraint that the peptide units are all in a trans configuration.
Generation of conformational ensembles
We have adapted conventional Markov-chain Metropolis Monte Carlo simulation strategies (60
,61
) to generate equilibrium ensembles for each of the protein sequences in the EV limit. Our algorithm is as follows:
- For a given sequence, N residues long, we start with a random, sterically allowed conformation for the chain and calculate the inverse power potential energy U according to the formula shown in Eq. 1.
- We then "roll an N-sided die" to choose a residue whose torsion angles are to be altered.
- We then "flip a two-sided coin" to decide if the trial move is going to be a backbone or side-chain move.
- Depending on the choice in step 3, the backbone
,
, or side-chain torsions are set to random values in the interval [180°,180°]. Trial moves that set backbone torsions are pivot moves because these lead to large-scale conformational changes. Conversely, side-chain moves lead to local conformational changes. The proposed torsions are used to compute new Cartesian coordinates for the molecule.
- Given a new set of Cartesian coordinates from step 4, we calculate the energy for the new conformation. This is referred to as U'. The energy difference
U = U U' is evaluated. This energy difference is used with the Metropolis criterion (61
) to accept or reject the proposed move. In detail, if
U < 0, the proposed move is accepted. Alternatively, if
U > 0, and a random number that is drawn from the interval [0,1] is less than exp[ß
U], the proposed move is accepted. For all other cases, the move is rejected. Here, ß = 1/RT, where R = 0.00199 kcal/mol-K is the ideal gas constant and T = 298 K is the simulation temperature. If the move is accepted, we set U = U', return to step 2, and iterate until convergence.
In the algorithm described above, steps 25 constitute a single trial move. For a given amino acid sequence, a complete simulation consists of 107 trial moves. For the longest sequence in our data setthe sequence of villinfor which N = 126, generation of the desired conformational ensemble takes
20 h on a single 2.4-GHz Intel Xeon processor. Snapshots were saved for analysis once every 103 trial moves. As a result, for each sequence, we generated an ensemble consisting of 104 uncorrelated conformations. The large-scale motion generated by backbone pivot moves ensures a lack of correlation between saved snapshots.
For each of the amino acid sequences shown in Table 1, ensemble averages and conformational distributions were obtained from an ensemble with a sample size of 104 and the ensembles were generated as discussed above. We have carried out a systematic analysis to assess the quality of data obtained using the protocol described above. Details of these tests for convergence of the simulations and the sample size are presented in the Appendix.
The major bottleneck to overcome in the design of efficient Monte Carlo simulations is the O(N2) complexity associated with computing energies for each new conformation. To speed up these calculations, we take advantage of the short range of inverse power potentials (53
). Specifically, we ignored the interactions between atoms in residues whose C
-C
distance exceeds 15 Å because the inverse power potential energy for these distances is nearly zero. In addition, a 10-Å distance-based cutoff was applied between all nonbonded atoms. We have compared our results to those from previous work (48
,58
) where no cutoffs were used. We were unable to find any statistically significant differences between results with and without cutoffs. This is mainly because of the short spatial range of EV interactions.
 |
CALCULATION OF SCATTERING PROFILES
|
|---|
The scattering form factor P(q) for a single chain conformation as a function of scattering wave number q is calculated as (62
64
)
 | (2) |
In Eq. 2, N is the number of residues, and Rij is the distance between atoms i and j. To calculate the form factor we used the positions of
-carbon atoms for each residue. For each amino acid sequence, the form factor was calculated for each snapshot generated from the Monte Carlo simulations. The ensemble-averaged form factor, i.e., the average over all 104 conformations, was used to compute the average Kratky profile. The wave numbers used in the calculations range from q = 0 to q = 0.5 Å1.
Calculation of shape parameters
The shape of a polymer can be quantified in terms of eigenvalues of the radius of gyration tensor. These eigenvalues tell us if a protein in a specific conformation is akin to a sphere, an ellipsoid, or a rod, and, if the polymer is ellipsoidal, is it a prolate or an oblate ellipsoid? We quantify polymer shapes in terms of two parameters, viz., asphericity (
) and a shape parameter, S. The former quantifies the degree of sphericity and the latter quantifies the principle axis direction in which the deviation from spherical geometry occurs. We follow the prescription of Schäfer (39
) and Steinhauser (65
) to calculate
and that of Dima and Thirumalai (66
) to calculate S. First, we define the radius of gyration tensor T, compute the eigenvalues of this tensor, and use these eigenvalues to compute
and S. The prescription is as follows:
 | (3) |
In Eq. 3,
i (i = 1, 2, 3) denote eigenvalues of the radius of gyration tensor, T, for a specific conformation. The tensor is computed for each conformation in the ensemble and then diagonalized. The ensemble average in Eq. 3 is computed as an average over all 104 snapshots. For a given conformation in the ensemble, the gyration tensor is computed as
 | (4) |
In Eq. 4, si = (ri rCM), where rCM is the position vector of the center of mass and ri denotes the position vector of the
-carbon for residue i. The gyration tensor is computed as an outer product of the radius of gyration vector.
 |
RESULTS
|
|---|
In this work, we focus on two-state proteins because the hypothesis is that only two well-defined macrostates native and highly denatured statesare accessible to these systems (67
70
). The underlying assumption is that the highly denatured-state ensemble for two-state proteins can be mimicked using our EV model. Table 1 lists relevant information for the 23 protein sequences (68
) used in this study. For each of the sequences shown in Table 1, we used Metropolis Monte Carlo simulations to generate representative conformational ensembles in the EV limit.
Identification of distinct length scales
SAXS (62
,63
) and small-angle neutron scattering (64
) measurements are useful for quantifying the average sizes, shapes, packing densities, and presence of distinct length scales in polymeric solutions. The form factor p(q) or its close counterpart, the Kratky profile (64
), q2p(q), provides average structural information across a range of wavelengths. Here, q is in units of inverse wavelength. For each of the sequences shown in Table 1, we computed an ensemble-averaged Kratky profile. The results are shown in Fig. 1. The Kratky profiles reveal the presence of three distinct regimes for each sequence. The first regime 0
q < 0.08 is the long wavelength regime typically used to quantify the average molecular weight of the polymer. The second, intermediate q regime lies in the interval 0.08
q < 0.25. The high q regime corresponds to q > 0.25.
The intermediate and high q regimes provide the most information regarding average chain shape and fluctuations (62
64
). Inspection of Kratky profiles in these two regimes suggests the following: in the EV limit, there are two discernible length scales. All sequence-specificity is localized to the high-q, short-wavelength regime. The implication is that sequence specificity influences local rather than nonlocal conformational preferences. In the intermediate q regime, proteins in the EV limit show scale-invariant, sequence-independent behavior wherein properties such as chain size and internal distances follow well-defined power laws.
Kratky profiles for proteins in the EV limit were compared to those of folded proteins and ideal, freely rotating chains (43
). An example of this comparison is shown in Fig. 2 for the protein ubiquitin. In the following sections, we show that the differences in Kratky profiles imply that in the EV limit proteins are cigar-shaped, loosely packed coils, with average topologies that are independent of amino acid sequence.

View larger version (14K):
[in this window]
[in a new window]
|
FIGURE 2 Comparison of Kratky profiles for three different models of ubiquitin, namely, the EV-limit ensemble (dashed curve), the freely rotating chain ensemble (solid curve), and the native structure (dash-dotted curve).
|
|
The average shape of a denatured protein is that of a prolate ellipsoid
We computed the ensemble-averaged asphericity values for each of the 23 sequences in the EV limit and the resultant data are shown as cross marks in Fig. 3. For comparison, the
values calculated from native structures are also shown as open circles. The average asphericity value of 0.5 is independent of sequence in the EV limit. This suggests that proteins in the EV limit have an average ellipsoidal shape. In addition to the asphericity, we computed ensemble-averaged shape parameters (S) for each protein sequence. Again, S = 0 for a perfect sphere. If S < 0, the object is oblate and if S > 0, the object is prolate. The data show that S
0.7 for all 23 sequences in the EV limit (Fig. 3). The conclusion is that in the EV limit, the average shape for a protein is that of a prolate ellipsoid, i.e., a cigar-shaped object.

View larger version (14K):
[in this window]
[in a new window]
|
FIGURE 3 Ensemble-averaged asphericity ( ) and shape (S) parameters for all 23 sequences in the EV limit (shown by cross marks) and for folded proteins (open circles). Error bars quantify the standard error in estimation of the mean.
|
|
Interestingly, although the average shape is independent of sequence in the EV limit, it clearly depends on sequence for folded proteins. A comparison of the
and S values of proteins in the EV limit to those of folded proteins is shown in Fig. 3. Although Rg scales with chain length as N0.34 for folded proteins (66
), the scaling law itself does not restrict folded proteins to be spherical globules. This point has been made recently by Dima and Thirumalai (66
) who carried out a systematic study of asymmetry in the shapes of folded proteins.
Although the average shape in the EV limit is that of a prolate ellipsoid, the fluctuations in the Rg and
-values are large. This is shown in Fig. 4 using a contour plot of the two-dimensional distribution function
(Rg/N0.6,
) for Fyn SH3 domain. The oblong shape reflects the coupling between shape and size. It is also seen that fluctuations span the spectrum of shapes and sizes. In other words, chains in the EV limit are not hard, prolate ellipsoids. Instead, they are soft ellipsoids that show large correlated fluctuations about mean values for Rg and
. In Fig. 5, we show backbone traces of 10 EV limit conformations each for four different protein sequences. The conformations, which are drawn at random from the ensembles, are oriented in the principle axis frames to illustrate the average prolate ellipsoidal shape as well as the large fluctuations that characterize the conformational distributions.

View larger version (36K):
[in this window]
[in a new window]
|
FIGURE 5 Ten representative conformations drawn from the EV-limit ensembles for four different protein sequences. The conformations are oriented in the principle axis frame, shown in the bottom left corner for each protein. The snapshots demonstrate both the average prolate shape and the large fluctuations.
|
|
Internal correlations show scale invariance
As noted in the introduction, the self-repelling nature of proteins in the EV limit imposes correlations on all length scales. These correlations lead to scale invariance in a variety of chain properties and direct evidence for correlations can be obtained by quantifying the scaling of internal distances. Theory predicts that ensemble-averaged internal distances will scale like ensemble-averaged end-to-end distances such that
(39
). Here,
is the mean-squared end-to-end distance,
, and we choose ri and rj to be the position vectors of
-carbon atoms of residues i and j, respectively. The implication is that
, where
0.59. This behavior is expected to hold for all |i j| > ns, where ns denotes the number of residues over which sequence context is important. Predictions for the scaling of internal distances are important because they also allow us to make direct contact with measurements of internal distances in denatured proteins. These measurements are becoming accessible to a variety of experiments that are based on the use of spin labels (71
77
).
In Fig. 6, we plot ln(
Rij
) versus ln(|j i|) for four representative sequences drawn from Table 1. Two parametric lines are used to calibrate the results. The solid lines have slopes of 0.59 and intercepts of 2.0, whereas the dashed lines have slopes of 1.0 and intercepts of 1.335. The latter were derived by assuming a fully extended, rodlike chain with distances between adjacent
-carbon atoms of 3.8 Å. Partial motivation for this reference line comes from the work of Zagrovic and Pande (78
), who showed that internal distances in unfolded ensembles of several proteins follow the predictions of the ideal random-flight chain with link length of 3.8 Å. In the EV limit, we find that irrespective of amino acid sequence, internal distances follow the power law predicted by theory for chains in a good solvent. Deviations from the power-law scaling occur for internal distances between residues that are <7 residues apart in sequence.

View larger version (12K):
[in this window]
[in a new window]
|
FIGURE 6 Scaling of internal distances as a function of sequence separation. Internal correlations are shown for four representative protein sequences. In each of the log-log plots, the solid line has a slope of 0.59 and intercept of 2.0 and the dashed line has a slope of 1.0 and intercept of 1.33. Average internal distances obey the universal power law scaling for sequence separations that go beyond five to nine residues.
|
|
For proteins in the EV limit, there are two distinct length scales. The first is a local length scale that spans seven-residue stretches. For sequence separations that go beyond this local length scale, chain properties such as Rg, Re, and internal distances scale with sequence separation according to universal power laws. Local stiffness is typically quantified in terms of a persistence length, which is the length scale over which the chain behaves like a rigid rod (40
,79
). The value of Ro obtained from fits to SAXS data for denatured proteins suggests that denatured proteins show rodlike behavior over very short length scales (11
). This is also confirmed from our analysis in Fig. 6, which shows that deviations from rodlike behavior occur for all sequence separations greater than a single residue.
Fig. 6 also shows that there is a local length scale over which proteins in the EV limit show nonuniversal behavior. This is not a persistence length. Instead, it is the length scale over which sequence-specific spatial correlations decay. To estimate this length scale, referred to as ns, we follow the prescription of Thirumalai and Ha (79
). Let li and lj be two "bond" vectors. The vector li straddles residue i extending between the backbone nitrogen and carbonyl carbon atoms of residue i; the vector lj straddles residue j. Correlation between a pair of "bond" vectors is quantified by computing the projection, cos(
ij), between the vectors. The value for ns is estimated from a plot of the ensemble average of
as a function of
, where the latter refers to the sequence separation. If a pair of bonds are highly correlated in the ensemble, then
. This is obviously true of adjacent vectors. As the sequence separation
increases, the correlations decay, and the value of
for which
is the estimated value for ns. Fig. 7 shows the calculated values of ns for all 23 sequences shown in Table 1. Values for ns range from six to nine residues and do not vary dramatically with protein sequence or chain length. The estimates for ns appear to be consistent with those from different measurements (80
84
) and calculations (48
,85
). The calculated value of ns is largest for CI2 and smallest for cold shock protein. It is important to reiterate that the concept of a persistence length is ill defined for a highly flexible chain. It is erroneous to multiply ns by 3.6 Å (the rise per residue for a fully extended conformation) and stipulate that this is the persistence length for a chain in the EV limit. In fact, the persistence length in the EV limitthe length over which the chain behaves like a straight segmentis <4 Å, i.e., no more than one residue. This estimate agrees with SAXS data, recent atomic force microscopy measurements (86
), and simulation results for different proteins (78
).
Protein interiors in the EV limit reveal cavities on all length scales
Field theories predict that chains in the EV limit are characterized by interior cavities of all sizes, reflecting the inefficient way in which the chains fill the available volume (39
). This is a result of correlations that exist on all length scales and the fact that interactions that give rise to these correlations are purely repulsive in nature.
Fig. 8 shows results from our quantitative analysis of cavity statistics for the EV-limit ensembles of proteins. In the interest of clarity, we show data for the sequence of ubiquitin. Similar results were obtained for all other two-state protein sequences shown in Table 1. The question we ask is, what is the probability that a sphere of radius a placed at random with respect to the center of mass of the chain will be empty? For each conformation in the EV-limit ensemble, we place a probe sphere of radius a at several random locations with respect to the center of mass and quantify the number of times a chain atom crosses the probe sphere. This procedure is repeated for all conformations within the ensemble. The resultant data are used to compute Poa(r), which is defined as the probability of finding a cavity of radius a at a distance r from the center of mass in the ensemble.

View larger version (26K):
[in this window]
[in a new window]
|
FIGURE 8 Analysis of cavity statistics. This is plotted as the probability Poa(r) of finding a cavity of size a at a distance r from the center of mass. The data in all of the panels are for the sequence of ubiquitin. (A) EV-limit ensemble of ubiquitin. (B) Folded structure of ubiquitin. (C) Ubiquitin modeled as a fully extended conformation. (D) An ensemble of ubiquitin modeled as a freely rotating chain.
|
|
For ubiquitin, we computed Poa(r) for probe spheres of radii ranging from 2.5 to 12.5 Å. The results are shown in Fig. 8 A. Remarkably, there is a 20% chance of finding a cavity of radius a = 12.5 Å at the average location of the center of mass. The finite probability of finding large cavities within the interior of denatured proteins emphasizes two points: First, the volume occupied by a chain is filled inefficiently when compared to either a folded protein or a freely rotating chain. Second, the cavity statistics are indicative of large-scale correlated fluctuations, which exist on all length scales in the EV limit. To illustrate these points, we compare the values of Poa(r) obtained in the EV limit to those for three different models.
Cavity statistics, Poa(r), for folded ubiquitin are shown in Fig. 8 B. In the folded form, the average packing density is high and protein interiors are thought to be either solidlike (87
,88
) or like "randomly packed spheres near their percolation threshold" (89
). Either way, the thinking is that it ought to be difficult to locate spherical cavities of different sizes within protein interiors. Fig. 8 B shows that it is in fact impossible to find room for small or large cavities unless the cavity is located sufficiently far from the center of mass of the folded protein. Interestingly, similar results are obtained for the protein modeled as a fully extended conformation (Fig. 8 C). This erroneous model is of interest only because it has been used previously for denatured proteins in studies aimed at correlating m values and
CP to changes in solvent-accessible surface area (90
). In the fully extended conformation, the chain is loosely packed because it is maximally stretched. Yet the probe sphere always intersects the chain unless it is centered sufficiently far away from the center of mass. The results in Fig. 8, B and C, underscore the importance of conformational fluctuations. It is impossible to capture the features of an ensemble, such as the creation of interior cavities, using a single conformation. Comparison of results in Fig. 8 A to those in Fig. 8 C suggest that the "observed" correlation between m values and
CP to changes in solvent-accessible surface area might in fact be serendipitous. A first-principles reassessment of the source of this empirical correlation is mandated. This is the topic of ongoing studies (H. T. Tran and R. V. Pappu, unpublished).
Are fluctuations in the EV limit correlated?
Theory predicts that the gross inefficiency with which the available volume is filled by polymers in the EV limit is in fact a manifestation of correlations between large-scale fluctuations. That this is indeed the case is shown by comparing cavity statistics in the EV limit to values obtained for freely rotating chains. The latter is a model for a soft, Gaussian coil with large-scale, albeit uncorrelated, fluctuations (43
). Results for ubiquitin modeled as a freely rotating chain are shown in Fig. 8 D. Since conformational fluctuations are uncorrelated in a chain devoid of interactions, it is impossible to find large cavities (a > 5 Å) within the interior of a freely rotating chain. There is, however, a finite probability of finding small cavities (a < 5 Å) within the interior of a freely rotating chain. In our implementation of the freely rotating chain model, all nonbonded interatomic interactions were turned off and ensembles were generated by drawing the
,
,
angles for each residue from sterically allowed regions. To implement the true spirit of a Flory model, we could have selected only those conformations that lead to reproduction of the N0.59 scaling law. Although such an exercise yields higher probabilities for large cavities, the difference is purely qualitative and does not alter the main conclusion.
A summary of the difference between correlated fluctuations in the EV limit and uncorrelated fluctuations for a Flory-like freely rotating chain is shown in Fig. 9, which plots the probabilities, Poa(r = 0), of finding cavities of different sizes at the ensemble-averaged center-of-mass as a function of cavity radius a. Although Poa(r = 0) decreases linearly with cavity size, a, for the EV limit, it decays much more rapidly for the freely rotating chain version of ubiquitin. Of course, what we refer to as cavities will actually be filled by solvent and cosolute molecules under denaturing conditions. The main point of the foregoing discussion is that inasmuch as there is congruence between the EV-limit ensemble and highly denatured states, chain fluctuations create ample room to accommodate favorable interactions with the surrounding solvent. Standard reference models such as the fully extended chain and the Flory random coil model will grossly underestimate both the diversity and extent of chain-solvent interactions, which in turn leads to a misrepresentation of the extent and type of conformational fluctuations.

View larger version (15K):
[in this window]
[in a new window]
|
FIGURE 9 Comparison of the effect of correlated versus of uncorrelated fluctuations on cavity statistics. Here we plot the probability, Poa(r = 0) of finding a cavity of radius a at the center of mass as a function of cavity radius a. The dashed curve is for the ensemble of ubiquitin modeled as a freely rotating chain (uncorrelated fluctuations) and the solid curve is for the EV-limit ensemble of ubiquitin (correlated fluctuations).
|
|
Can the differences quantified in Fig. 8 be tested experimentally?
Fluctuations for a chain of length N will lead to cavities that are large enough to allow for the free diffusion of a smaller chain of length n < N. This observation led Khokhlov and coworkers (91
,92
) to propose an experiment whereby a reactive group is placed at the center of a chain molecule and the rate of interpolymer reactions is followed as a function of chain length. Reaction rates will be dictated by the accessibility of the reactive group. If denatured proteins follow the Flory random-coil model, the reaction rates would drop exponentially as chain length increases because the reactive group ought to become increasingly inaccessible due to uncorrelated fluctuations. Conversely, for a chain that follows the predictions of field theories in the EV limit, the reaction rate will decrease as some power law with chain length, and there will be a finite probability of realizing a reaction with the reactive group even for very long chains. Advancements in analytical chemistry and mass spectrometry suggest that Khokhlov's proposal can be tested using novel cross-linking approaches that are being developed for quantitative studies of protein folding (93
). Other experimental probes can also be used. The form factor in the high q regime provides a measure of the number of interresidue interactions that can be found within a distance a
q1 from each other and this will scale as a1.7 (39
). Finally, because of the large cavities created by a chain in the EV limit, it is expected that the second virial coefficient (B2) for highly denatured proteins will scale with chain length as N0.59 (39
,41
).
Contacts are hierarchical and average topologies are independent of sequence
Two residues are said to be in contact if there are at least two atoms (including hydrogen atoms), one from each residue, within a 6-Å distance of each other. The histogram of interresidue contacts can be plotted as a contact density map and the results are shown in the top row of Fig. 10 for three proteins of different lengths. Irrespective of chain length and sequence, the contact densities follow a hierarchical pattern whereby near-neighbor residues have a higher probability of being spatially proximal. The probability of finding a pair of residues in close spatial proximity decreases with increasing sequence separation. If one were to zoom into the contact density map of a long protein such as titin one reproduces the contact density map for a shorter protein such as ubiquitin or peripheral subunit binding domain (PSBD). Conversely, zooming out or scaling up from the contact density map of a short protein like PSBD will yield the contact density maps of longer proteins such as ubiquitin or titin. This scale invariance, referred to as dilatation symmetry (39
) is a hallmark of chains in the EV limit and reflects the preservation of the hierarchical nature of contact patterns irrespective of sequence or chain length.

View larger version (56K):
[in this window]
[in a new window]
|
FIGURE 10 Top row shows the contact density maps for EV-limit ensembles of PSBD, ubiquitin, and titin. The color bar for all three plots is shown on the right. To provide a contrast of the folded state to the EV-limit ensemble, the middle row shows contact maps for native structures of PSBD, ubiquitin, and titin. The bottom row shows how the contact density maps in the EV limit come about. Each panel shows distance distributions for different pairs of residues that have different spacing in sequence space. Distance distributions for residues that are local in sequence space are sharply peaked around close distances, whereas distributions for residues that are far apart in sequence are broad and peaked around large distances. The broad distance distributions for distal residues lead to large-scale fluctuations in the EV limit.
|
|
The large-scale fluctuations that give rise to the contact density maps shown in Fig. 10 are best explained in terms of distributions for interatomic distances. In the bottom row of Fig. 10 we show distributions of distances obtained for different pairs of residues in the three proteins: PSBD, ubiquitin, and titin. The distributions of distances are sharply peaked for near-neighbor residues and they become increasingly broad as sequence separation increases. In addition, the distance distributions are peaked at larger distance values as sequence separation increases. This emphasizes three important points regarding protein ensembles in the EV limit: First, the dominant contacts are in fact local. Second, the magnitude of fluctuations in interresidue distances increases with increasing sequence separation. Third, these increased fluctuations could certainly lead to the occasional close approach of distal amino acids. Experiments that only detect close spatial contacts will be interpreted as providing evidence of long-range "residual" structure in the denatured state (94
98
). Contrary to interpretations of many such experiments, numerous molecular simulations (7
,23
,99
) and recent single-molecule experiments (75
) provide little evidence of long-range residual structure under harshly denaturing conditions. The main conclusion is that analysis of the EV-limit ensembles does not preclude the possibility of occasional close contacts between residues that are distal in sequence. It does, however, predict that these contacts have low probabilities and are sampled in the tails of distance distributions. Conventional NMR experiments based on the nuclear Overhauser effect are incapable of resolving contacts that go beyond 57 Å. Hence, one must be cautious in interpreting observations of nuclear Overhauser effects as evidence for residual, long-range structure in highly denatured states.
Dilatation symmetry is preserved for all sequences in the EV limit. Conversely, the contact density maps for the folded versions of different sequences reflect differences in native-state topologies. Given access to contact density maps for the denatured state (EV limit) and contact maps for native states, one can make a qualitative judgment regarding the folding process by computing difference contact density maps between the native and denatured states. These difference maps are shown in Fig. 11. These maps show regions where contacts are either present (strong) or absent (weak) in both the native and denatured states. They also show contacts that are strongly represented in the native state and weak in the denatured state. Regions shaded in black are contacts that are pronounced in the denatured state. From the difference contact maps, we find that upon folding, specific nonnative local contacts have to be broken (weakened) to make native, nonlocal spatial contacts. The number and locations of nonnative local contacts that are to be broken determine the sets of spatial contacts that are formed upon folding.

View larger version (28K):
[in this window]
[in a new window]
|
FIGURE 11 Difference contact density maps for PSBD, ubiquitin, and titin. Contacts that are either missing or weak in the native state but are present in the EV limit are shown in black in the difference contact maps.
|
|
Formally, folding can be viewed as a symmetry-breaking operation wherein the dilatation symmetry characteristic of the denatured-state (EV limit) ensemble is broken by breaking or disrupting the requisite number of nonnative local contacts. If folding were strictly driven by the formation of local contacts (100
102
), as in a helix-coil transition, then no nonnative local contacts would have to be broken upon folding. Instead, new local contacts would be added onto those that already exist in the denatured state. However, since folding requires the formation of spatial, long-range contacts, local nonnative contacts have to be broken. How the dilatation symmetry of denatured states is broken under folding conditions will depend on a variety of factors including local biases for turns and short stretches of extended or helical conformations, the drive to sequester hydrophobic amino acids, and the achievement of specificity in side-chain packing (103
). These interactions will be determined by the specific sequence or, more precisely, by native-state topology.
The importance of native-state topology for folding is underscored by analysis of the average denatured-state topology. Folding rates for two-state proteins show statistically significant correlation with native-state contact order (104
). In their original work, Plaxco et al. (104
) ignored denatured-state topologies when quantifying the correlation between native-state topology and folding rates. The strong positive correlation between folding rates and contact order implies that folding rates depend only on the end point, i.e., native-state topology. A similar principle underlies the design of energy landscape theories for folding kinetics that are based on G
models (105
107
). At first glance, these results are surprising since the highly denatured state is the starting point for in vitro folding reactions and yet no consideration of the denatured-state topology is required to account for the folding rates. These results would make sense if denatured-state topologies were equivalent and invariant with sequence.
Indeed, for all 23 sequences in the EV limit we find that the absolute contact orders are independent of sequence. We calculated absolute contact order using the method of Plaxco et al. (104
). The sequence independence of absolute contact orders in the EV limit is shown in Fig. 12, which plots the absolute ensemble-averaged, EV-limit contact order for all 23 sequences. For comparison, the absolute contact orders of the native-state counterparts are also shown. Since contact order is well-established as a "single value descriptor of topological complexity" (68
), the data in Fig. 12 support the conclusion that EV limit ensembles are topologically equivalent. This equivalence in average topologies of denatured states explains why it has been reasonable to ignore the denatured state when assessing the contribution of topology to folding rates for small two-state proteins.

View larger version (12K):
[in this window]
[in a new window]
|
FIGURE 12 Absolute contact orders for all 23 sequences in the EV limit (cross marks) and for native structures (open circles). Invariance of contact order with sequence in the EV limit suggests that the average topology does not depend on sequence in the denatured state.
|
|
The distribution of end-to-end distances
The distribution of end-to-end distances is a fundamental quantity for comparing predictions of different polymer theories (38
42
). If
, where
is the mean-squared end-to-end distance, then
is the probability of finding a conformation with x values between x and x + dx. For a Flory random coil, P(x) is a Gaussian distribution of the form
. The functional form for P(x) in the EV limit has been derived by des Cloizeaux (41
) as an interpolation between the predicted results for P(x) for large (87
110
) and small x (111
). des Cloizeaux's formula is
(41
). Here, ao is a normalization constant, which ensures that
.
Fig. 13 shows a comparison of P(x) predicted by theory to distributions computed for four different proteins in the EV limit. Similar data were obtained for all protein sequences shown in Table 1. Simulated data agree with theoretical predictions and the agreement is quantified in terms of residuals between the theoretical distribution and those from simulations. The dashed curve in Fig. 13 is the Gaussian distribution for P(x) that fits a Flory random coil. Comparison of the dashed curve to the other distributions reveals two features of the end-to-end distance distribution in the EV limit. For large x, entropy opposes stretching of the chain beyond its average value of Re. However, there is a diminution in the entropy in the EV limit vis-à-vis the the Flory model. This is evident in the more rapid decay of P(x) for large x in the EV limit. For small x, the discrepancy is even more pronounced. In the EV limit, there exists a so-called "correlation hole" (39
). Stated differently, correlated chain repulsions drastically reduce the probability that the N- and C-termini come very close together. Conversely, P(x) is maximal for small x if one assumes a Flory-style random coil model with uncorrelated fluctuations and uniform (mean-field) chain swelling.

View larger version (26K):
[in this window]
[in a new window]
|
FIGURE 13 End-to-end distance distribution for five representative sequences in the EV limit. The parameter . The solid curve shows the distribution predicted by des Cloizeaux and the dashed curve is the Gaussian distribution that applies for the Flory random coil. The bottom panel shows residuals between data from EV-limit simulations and the theoretical (solid) curve.
|
|
The existence of a correlation hole in the EV limit has been demonstrated in Monte Carlo simulations for a variety of polymeric systems (112
,113
). Recently, Zhou (114
) computed the functional form of x2P(x) from EV simulations of proteins studied by Wilkins et al. (46
). The fit obtained by Zhou for x2P(x) (114
) is consistent with predictions made by field theory, although Zhou has pursued an alternative interpretation (114
116
) of the des Cloizeaux functional form. His interpretation is anchored in refinements (45
) of the Flory random coil model (44
). Refinements of Flory's mean-field theory have to be used with caution and tailored for each application because they are not designed to capture renormalizable features of polymers in the EV limit (38
).
 |
CONCLUSIONS
|
|---|
We have used an atomistic EV model, developed in previous work, to show that it is computationally tractable to generate accurate conformational ensembles for proteins in the EV limit. The accuracy of these ensembles is judged by matching the structural characteristics of the simulated ensembles to those predicted by field theories. Given the equivalence between the EV limit and highly denatured states, our ability to simulate conformational ensembles in the EV limit, with full atomic detail, has direct bearing on the development of an accurate physical picture of conformations accessible to denatured proteins. A summary of our results from analysis of EV-limit ensembles for 23 different two-state prote