| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Biochemisches Institut der Universität Zürich, Zürich, Switzerland; and
Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
Correspondence: Address reprint requests to Dr. Emanuele Paci, University of Zurich, Dept. of Biochemistry, Winterthurestrasse 190, Zurich, 8057, Switzerland. Tel.: 41-1-635-5559; E-mail: paci{at}bioc.unizh.ch.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
ali et al., 1994
models (Taketomi et al., 1975
models have been used to study the stability of the native state of proteins (Kaya and Chan, 2002
Computer simulations are capable, at least in principle, of reconstructing with accuracy the complete free-energy landscape of proteins. This task is an ambitious one, however, since it requires the generation of very long trajectories from which the equilibrium properties of a system can be determined. Therefore, free-energy landscapes have, most often, been determined by the use of simple protein models (Dinner et al., 2000
; Karanicolas and Brooks, 2003
). More recently, advances in computer technology have made it possible to use all-atom molecular dynamics simulations to characterize the free-energy landscape of several polypeptide chains, including the two 20-residue-designed peptides GSGS (Ferrara and Caflisch, 2000
) and Betanova (Bursulaya and Brooks, 2000
), the C-terminal ß-hairpin of protein G (Dinner et al., 1999
; Zhou and Zhou, 2002
), protein A (Ghosh et al., 2002
), and the src-SH3 domain (Shea et al., 2002
).
The study presented here compares in detail the thermodynamic properties of the GSGS peptide (TWIQNGSTKWYQNGSTKIYT) (de Alba et al., 1999
), as determined by using a sequence-based (transferable) energy function (TEF) and various structure-based (nontransferable) G
-like energy functions (GEFs). One of the GEF models that we studied is an all-atom model that has the same degrees of freedom of the TEF; the other GEF models are frequently used models based on coarse-grained descriptions of the polypeptide chain that use only C
atoms. Reversible folding of the GSGS peptide, a necessary condition for determining equilibrium properties, can be achieved by using all such models.
In structure-based models the parameters are chosen so that the native state of the particular polypeptide chain under study corresponds to the overall minimum of the energy. In most cases, however, the parameters are not optimized to reproduce also the experimental measurements on the thermodynamics and kinetics of the system. Indeed, in several studies where this type of model was used, the folding mechanism was shown to depend strongly on the values of the parameters (Zhou and Karplus, 1999
; Zhou and Linhananta, 2002
; Zhou et al., 2003
). Similarly, it is a very difficult task to define the values of the parameters of sequence-based models to simultaneously reproduce all the structural, thermodynamic, and kinetic observations made experimentally. This problem was illustrated clearly in a recent study by Alan Fersht and co-workers that showed that the folding behavior of protein A was predicted in different ways by the various models used (Sato et al., 2004
). The observation that models having the correct native state may not describe the kinetic properties correctly is complemented in the present study by a quantitative comparison between the thermodynamic properties of the GSGS peptide as obtained by using several different foldable models.
| METHODS |
|---|
|
|
|---|
1 = 0.7. For two atoms, A and B, at a distance R, the energy E(A, B) was computed as (Shimada et al., 2001
![]() | (1) |
=
1(rA + rB) is the hard-core distance,
2 = 1.65 a scaling factor that controls the width of the well, and
(A, B) = En = 1, if A and B are in contact in the native reference structure and Enn otherwise. We considered three values for non-native energy Enn (0, 0.5, and 1). The total energy was computed as
where the sum was extended to all-atoms pairs, excluding those of successive residues along the chain and all backbone-backbone contacts. We used two types of MC moves. The first was a rotation of a side-chain rotatable bond by an angle drawn from a Gaussian distribution with zero mean and standard deviation of 0.1 rad. The second type of move was a concerted rotation of the backbone
angles of four successive residues, for which we used a variant of the algorithm of Favrin et al. (2001)
C
GEF simulations
We used several different G
models based on a C
-only description of the polypeptide chain. In these models, interactions between C
pairs are attractive for native contacts. The models differ in the functional form of the interactions, which are always attractive for native contacts, i.e., pairs of C
atoms that are less than 8 Å in the native structure. In one model, the interaction between pairs making a native contact has a square-well form (C
-SW-GEF): the interaction is 1 for distances between 0.9 dN and 1.2 dN, where dN is the distance in the native structure, zero for distances more than 1.2 dN, and infinity for distances less than 0.9 dN. In this case, we also studied the effect of different types of non-native interactionsrepulsive, neutral, or mildly attractive. For non-native contacts we considered three different values for the interaction Enn (0.1, 0.5, 1), defined for distances between 0.9 dNN and 1.2 dNN. The distance for a pair of C
atoms is computed as follows. For each atom i, we define its repulsion radius dR(i) to be the distance to the first atom j not making a native contact with it; the non-native distance is then
![]() | (2) |
Simulations were performed with the Monte Carlo (MC) package, almost.
We also considered the improved C
G
model, C
-KB-GEF, proposed by Karanicolas and Brooks (2002)
. In this model, residues separated in sequence by three or more bonds, and which are in contact in the native reference structure, are subject to an interaction energy of the form
![]() | (3) |
An important feature of this model, which is different from all the other G
models considered here, is that both the strength (
ij) and the range on native interactions (
ij) are determined from the all-atom native structure using detailed interactions (hydrogen bonds, etc.). Residues not defined as native contacts were subject to repulsive potential of the form
![]() | (4) |
Another original feature of the C
-KB-GEF force field is a sequence-specific term related to the backbone dihedral angles of the protein (Karanicolas and Brooks, 2002
). C
-KB-GEF simulations were performed using Langevin molecular dynamics using the program CHARMM (Brooks et al., 1983
); the appropriate topology and parameter files were generated using the web server http://mmtsb.scripps.edu/.
Molecular dynamics simulations
Molecular dynamics (MD) TEF simulations were performed with the program CHARMM (Brooks et al., 1983
), using an implicit solvation model based on the solvent-accessible surface (Ferrara et al., 2002
). This model has not been optimized using the structure of this specific peptide, and it has been shown to reversibly fold various peptides to their respective experimental structures (
-helical or ß-sheet; see Ferrara et al., 2002
; Ferrara and Caflisch, 2000
; Hiltpold et al., 2000
). In the cases of the GSGS peptide, during long MD simulations a triple-stranded ß-sheet conformation satisfying the experimental NOE-derived distances (de Alba et al., 1999
) is highly populated. A reference native structure was generated by extracting a low energy structure from a low temperature simulation started from the native state. This structure satisfies all the 26 experimental NOE-derived distances and was subsequently energy-minimized. The resulting structure was also assumed as the native reference structure for all the GEF models used in this work.
Thermodynamic properties
Since we compare how different models (C
and all-atom) describe the conformational properties of the GSGS peptide, only those properties depending on the C
positions are used here. We thus define the radius of gyration and the root mean-square deviation (RMSD) from the native structure in terms of C
atoms. We also monitor the number of contacts; all the pairs of C
atoms less than 8 Å apart and separated by more than 3 residues in the sequence are considered to be in contact. The total number of contacts in the reference structure of the GSGS peptide is 40; 20 of these contacts are between strands 1 and 2, and 19 between strands 2 and 3.
The number of non-native contacts is the total number of contacts minus the number of native ones. To estimate the number of folding events, we count the number of times the peptide satisfies the specific conditions for being folded, starting from any conformation that satisfies the specific condition for being unfolded. We assume that when more than 30 native contacts are present and the RMSD from the native reference structure is less than 2.5 Å, the peptide is native; when less than 12 native contacts are present and the RMSD from the native structure is more than 5 Å, the peptide is assumed to be unfolded. These criteria are derived from the typical bimodal distributions of both the number of native contacts and RMSD from the native structure observed for the models considered in this work (see below).
| RESULTS |
|---|
|
|
|---|
All-atom GEF simulations
The reference native structure of the peptide contains 360 interatomic native contacts. In the aa-GEF case we performed five simulations, each of 109 MC steps, for three different values of Enn (0, 0.5, and 1).
As in the TEF simulations, reversible transitions between completely folded and completely unfolded conformations are observed. For example, we observed 35 folding/unfolding events over the total 5 x 109 MC steps for the case Enn = 1 at T = 1.52.
C
-GEF simulations
C
-KB-GEF simulations, using Langevin molecular dynamics, were performed at T = 344 K; we observed about 4000 folding/unfolding events during a 20-µs simulation. All the other C
-GEF models were sampled with MC using the program almost. The C
-SW-GEF was sampled with 109 MC steps; for example, for the case Enn = 1, we counted 317 folding/unfolding events.
Determination of the melting temperature
All simulations were performed close to the melting temperature Tm. We used two different methods to compute Tm. For all the G
models, several simulations in a broad range of temperatures were performed. The specific heat was then computed by combining the simulations with the weighted histogram algorithm method (Ferrenberg et al., 1995
). For the TEF model, the density of states
(E) was computed using the Wang-Landau method (Wang and Landau, 2001a
,b
). Then, the specific heat was obtained from
![]() | (5) |
![]() | (6) |
The specific heat as a function of the temperature, for the various models, is shown in Fig. 1.
|
|
-RMSD. This type of behavior becomes less well-pronounced for lower values of Enn and almost disappears for Enn = 0. In the C
-KB-GEF case, the distributions of native contacts and of the RMSD from the native state have two well-defined peaks that correspond to the native and unfolded states, respectively. The C
-SW-GEF model is characterized, in the case of Enn = 1, by a rather narrow native state, in terms of both the distributions of N and RMSD; whereas in the unfolded state, it has a peak at about 20 native contacts and a broad distribution of RMSD between 3 and 10 Å. Interestingly, the typical two-state behavior and a well-defined native state with a population of about 50% is also observed when non-native interactions are set to zero or even to a slightly negative value (Enn = 0.1).
|
-GEF models, and particularly for the C
-KB model, the distribution of Rg is narrower and closer to the TEF case. Interestingly, for the C
-SW model, the effect of slightly attractive non-native interactions does not dramatically change the thermodynamical properties of the system, as shown by the distributions in Fig. 3 d.
A closer view of the conformations populated in the various models studied is provided by the two-dimensional histogram of N12 and N23, which provides information about the degree of formation of the two native ß-hairpins. The probability of conformations with a given number of contacts N12 and N23 (Fig. 4) shows remarkable differences between the TEF and the GEF cases and between the C
and the all-atom GEF cases. In the TEF case, the simultaneous formation of the two hairpins is rather unlikely, whereas conformations in which only one of the two hairpins is entirely formed correspond to local minima on the free-energy surface; the free-energy surface is not symmetric and the formation of contacts N23 is slightly more favorable. This is not the case for the aa-GEF model; here symmetric conformations, in which the same number of native contacts is present in each hairpin, are more favorable than asymmetric ones. We also observe that the native minimum in the GEF case is broad and not very close to the minimum energy conformation (i.e., the conformation where all the native contacts are formed). In the C
-GEF models some of the features of the TEF are preserved; for example, at variance with the aa-GEF case, the folded conformation corresponds to a narrow minimum in free energy. All C
-GEF models, and the KB model in particular, have an asymmetric free-energy landscape due to a high probability of finding conformations where the hairpin 12 is formed and the hairpin 23 is not. Vice versa, in the TEF model a well-populated metastable state consists of conformations where only the hairpin 23 is formed.
|
models the energy is symmetric and has a deep minimum in correspondence of N12 = 19 and N23 = 20, i.e., when both hairpins are formed. For the TEF model (Fig. 5 f), the energy surface is slightly different, and shows that conformations where strands 23 are formed are energetically very favorable. It is also relevant that, in the case of the TEF model, the largest energy difference is about 30 (in kBTm units), whereas it is considerably larger for the GEF models; it is of about 70 for the C
-KB-GEF model, 180 for the aa-GEF model (Enn = 1), and 45 for the C
-SW-GEF. In the TEF model the energy difference between the native and the most unfolded conformations is thus considerably lower than that of any GEF model.
|
|
-KB-GEF indicate that the use of G
-like models defined at a more coarse-grained level is less affected by this problem, and possibly better suited to represent the native free-energy minimum (see e.g., Zhou and Karplus, 1999
The definition of the native contact map is an important aspect of any GEF simulation. Owing to thermal fluctuations, the native state is best represented by an ensemble of contact maps. In a GEF model, however, a single reference contact map should be chosen. We explored the dependence of the results on this choice by repeating the calculations for two additional contact maps, for the all-atom G
models studied here. The variations in contact maps are particularly significant in the case of the small peptide considered here for which the number of all-atom contacts ranges between 260 and 360 for all the structures with N = 40 characteristic contacts in the TEF native simulation. Since some properties of GEF models may be sensitive to the choice of the reference contact map, we repeated the calculations for two additional aa-GEF models that differ by the choice of the native contact map. Alternative contact maps were derived from the structure with the largest and the smallest number of contacts among the structures explored in a native simulation. Although the changes in the total number of contacts in the native state, and in the position of the maxima in the distributions of RMSD and Q are not negligible (data not shown), the general features of the free-energy landscape of a GEF model, such as the broadness of the native state and the high degree of disorder of the unfolded state, do not depend on the choice of the reference contact map.
| CONCLUSIONS |
|---|
|
|
|---|
-like, structure-dependent models (GEF). Our results indicate that the TEF and GEF models we studied have different free-energy landscapes. These differences are particularly significant for the all-atom GEF model where interactions between pairs of atom which are in contact in the native state are equal for all pairs. In addition, in this case, a repulsive interaction between pairs that are not in contact in the native state seems to be a requirement to obtain a two-state behavior. Simpler models, where only C
atoms are considered, give results which are, at least in certain respects, closer to the transferable, sequence-dependent potential. This is particularly the case for the C
-KB-GEF model, where interactions are weighed according to the detailed interatomic interactions in the native state. Favorable non-native interactions are also absent in this model, but, possibly because of the presence of some long-range interactions, non-native contacts are observed in the unfolded state. As a consequence, the unfolded state is more compact than for other G
-like models, even if not as compact as in the TEF case.
In this article we have compared the thermodynamic properties of various sequence-based and structure-based models, and only briefly discussed how these models describe the folding behavior of this peptide. Kinetic properties, however, could, in general, be expected to be different for different free-energy surfaces. A recent experimental study of the folding protein A (Sato et al., 2004
) has discussed several theoretical predictions of the folding mechanism obtained with different models, and showed that they lead to different descriptions of the folding pathway. Taken together, these results suggest that a given native fold may be encoded by models with very different thermodynamic and kinetic behaviors.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We acknowledge the financial support from the Royal Society and the Leverhulme Trust (to M.V.) and the Forschungskredit der Universität Zürich (to E.P.).
| FOOTNOTES |
|---|
Submitted on October 29, 2004; accepted for publication February 8, 2005.
| REFERENCES |
|---|
|
|
|---|
Bursulaya, B. D., and C. L. Brooks III. 2000. Comparative study of the folding free-energy landscape of a three-stranded ß-sheet protein with explicit and implicit solvent models. J. Phys. Chem. B. 104:1237812383.
Cavalli, A., U. Haberthür, E. Paci, and A. Caflisch. 2003. Fast protein folding on downhill energy landscape. Protein Sci. 12:18011803.
Clementi, C., A. E. Garcia, and J. N. Onuchic. 2003. Interplay among tertiary contacts, secondary structure formation and side-chain packing in the protein folding mechanism: all-atom representation study of protein L. J. Mol. Biol. 326:933954.[CrossRef][Medline]
de Alba, E., J. Santoro, M. Rico, and M. A. Jimenez. 1999. De novo design of a monomeric three-stranded antiparallel ß-sheet. Protein Sci. 8:854865.[Abstract]
Ding, F., N. V. Dokholyan, S. V. Buldyrev, H. E. Stanley, and E. I. Shakhnovich. 2002a. Direct molecular dynamics observation of protein folding transition state ensemble. Biophys. J. 83:35253532.
Ding, F., N. V. Dokholyan, S. V. Buldyrev, H. E. Stanley, and E. I. Shakhnovich. 2002b. Molecular dynamics simulation of the SH3 domain aggregation suggests a generic amyloidogenesis mechanism. J. Mol. Biol. 324:851857.[CrossRef][Medline]
Dinner, A. R., V. I. Abkevich, E. I. Shakhnovich, and M. Karplus. 1999. Factors that affect the folding ability of proteins. Proteins. 35:3440.[Medline]
Dinner, A. R., A.
ali, L. J. Smith, C. M. Dobson, and M. Karplus. 2000. Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem. Sci. 25:331339.[CrossRef][Medline]
Favrin, G., A. Irbäck, and F. Sjunnesson. 2001. Monte Carlo update for chain molecules: biased Gaussian steps in torsional space. J. Chem. Phys. 114:81548158.[CrossRef]
Ferrara, P., and A. Caflisch. 2000. Folding simulations of a three-stranded antiparallel ß-sheet peptide. Proc. Natl. Acad. Sci. USA. 97:1078010785.
Ferrara, P., J. Apostolakis, and A. Caflisch. 2002. Evaluation of a fast implicit solvent model for molecular dynamics simulations. Proteins. 46:2433.[CrossRef][Medline]
Ferrenberg, A. M., D. P. Landau, and R. H. Swendsen. 1995. Statistical errors in histogram reweighting. Phys. Rev. E. 51:50925100.[CrossRef]
Fersht, A. R., and V. Daggett. 2002. Protein folding and unfolding at atomic resolution. Cell. 108:573582.[CrossRef][Medline]
Ghosh, A., R. Elber, and H. A. Scheraga. 2002. An atomically detailed study of the folding pathways of protein A with the stochastic difference equation. Proc. Natl. Acad. Sci. USA. 99:1039410398.
Hiltpold, A., P. Ferrara, J. Gsponer, and A. Caflisch. 2000. Free-energy surface of the helical peptide Y(MEARA)(6). J. Phys. Chem. B. 104:1008010086.
Karanicolas, J., and C. L. Brooks III. 2002. The origins of asymmetry in the folding transition states of protein L and protein G. Protein Sci. 11:23512361.
Karanicolas, J., and C. L. Brooks III. 2003. Improved Go-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions. J. Mol. Biol. 334:309325.[CrossRef][Medline]
Kaya, H., and H. S. Chan. 2002. Towards a consistent modeling of protein thermodynamic and kinetic cooperativity: how applicable is the transition state picture to folding and unfolding? J. Mol. Biol. 315:899909.[CrossRef][Medline]
Micheletti, C., J. R. Banavar, A. Maritan, and F. Seno. 1999. Protein structures and optimal folding from a geometrical variational principle. Phys. Rev. Lett. 82:33723376.[CrossRef]
Onuchic, J. N., Z. Luthey-Schulten, and P. G. Wolynes. 1997. Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem. 48:545600.[CrossRef][Medline]
Paci, E., A. Cavalli, M. Vendruscolo, and A. Caflisch. 2003. Analysis of the distributed computing approach applied to the folding of a small ß-peptide. Proc. Natl. Acad. Sci. USA. 100:82178222.
Pande, V. S., A. Y. Grosberg, and T. Tanaka. 2000. Heteropolymer freezing and design: toward physical models of protein folding. Rev. Mod. Phys. 72:259314.[CrossRef]
Sato, S., T. L. Religa, V. Daggett, and A. R. Fersht. 2004. Testing protein-folding simulations by experiment: B domain of protein A. Proc. Natl. Acad. Sci. USA. 101:69526956.
Shea, J. E., J. N. Onuchic, and C. L. Brooks III. 2002. Probing the folding free-energy landscape of the Src-SH3 protein domain. Proc. Natl. Acad. Sci. USA. 99:1606416068.
Shimada, J., E. L. Kussell, and E. I. Shakhnovich. 2001. The folding thermodynamics and kinetics of crambin using an all-atom Monte Carlo simulation. J. Mol. Biol. 308:7995.[CrossRef][Medline]
Taketomi, H., Y. Ueda, and N. G
. 1975. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. Int. J. Pept. Protein Res. 7:445459.[Medline]
Thirumalai, D., D. K. Klimov, and R. Dima. 2002. Insights into specific problems in protein folding using simple concepts. Adv. Chem. Phys. 120:3676.
ali, A., E. Shakhnovich, and M. Karplus. 1994. How does a protein fold? Nature. 369:248251.[CrossRef][Medline]
Wang, F. G., and D. P. Landau. 2001a. Determining the density of states for classical statistical models: a random walk algorithm to produce a flat histogram. Phys. Rev. E. 64:056101.[CrossRef]
Wang, F. G., and D. P. Landau. 2001b. Efficient, multiple-range random walk algorithm to calculate the density of states. Phys. Rev. Lett. 86:20502053.[CrossRef][Medline]
Zhou, H., and Y. Zhou. 2002. Folding rate prediction using total contact distance. Biophys. J. 82:458463.
Zhou, Y., and M. Karplus. 1999. Interpreting the folding kinetics of helical proteins. Nature. 401:400403.[CrossRef][Medline]
Zhou, Y., and A. Linhananta. 2002. Role of hydrophilic and hydrophobic contacts in folding of the second ß-hairpin fragment of protein G: molecular dynamics simulation studies of an all-atom model. Proteins. 47:154162.[CrossRef][Medline]
Zhou, Y., C. Zhang, G. Stell, and J. Wang. 2003. Temperature dependence of the distribution of the first passage time: results from discontinuous molecular dynamics simulations of an all-atom model of the second ß-hairpin fragment of protein G. J. Am. Chem. Soc. 125:63006305.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
F. Cecconi, C. Guardiani, and R. Livi Testing Simplified Proteins Models of the hPin1 WW Domain Biophys. J., July 15, 2006; 91(2): 694 - 704. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. K. West, D. J. Brockwell, P. D. Olmsted, S. E. Radford, and E. Paci Mechanical Resistance of Proteins Explained Using Simple Molecular Models Biophys. J., January 1, 2006; 90(1): 287 - 297. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |