| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Forschungszentrum Karlsruhe, Institut für Nanotechnologie, 76021 Karlsruhe, Germany
Correspondence: Address reprint requests to W. Wenzel, E-mail: wenzel{at}int.fzk.de.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Our approach to all-atom structure prediction is based on the thermodynamic hypothesis (14
), which postulates that many proteins are in thermodynamic equilibrium with their environment. For these systems the native conformation corresponds to the global minimum of their free-energy landscape (15
,16
). Using the prevailing funnel paradigm (17
,18
) for protein folding, thermodynamically inspired optimization methods can locate the global optimum of the free-energy surface as the native conformation. In contrast to Levinthal's folding-path scenario (19
), an optimization method need not follow the dynamics of the system, which makes this approach potentially much faster. We developed an all-atom protein force field (PFF01) (7
,20
,21
) with an area-based implicit solvent model that approximates the free energy of peptide conformations under physiological conditions. Using this free-energy force field we were able to predict the tertiary structure of several two- and three-helix proteins: the 20-amino-acid Trp-cage protein (7
,22
24
), the 36 amino-acid villin headpiece (25
), and the 40-amino-acid headgroup of the HIV accessory protein (9
,26
). Our method requires an accurate force field and an efficient stochastic optimization method to reliably locate the global optimum of the free-energy surface. Little is presently known about the increase of computational complexity with system size or the relative efficiency of different optimization strategies. Stochastic methods (27
) map the search for the global optimum to a fictitious dynamical process that explores the free-energy surface with a bias toward low-energy conformations. The performance of methods with just one such dynamical process (7
,20
) is obviously limited by the speed of the energy/force evaluation for each proposed conformation. Efficient parallel methods could speed up the search process significantly, but their efficiency often saturates quickly with the number of dynamical processes (replicas) (26
). Here we investigate a simple evolutionary strategy and demonstrate that it can overcome these limitations. Evolutionary strategies evolve an active population of many replicas. Their selection rules generate a dynamic memory of the overall process, which should speed up its convergence. We report two different folding simulations for the 60-amino-acid bacterial ribosomal protein L20 (28
,29
), both of which converge to low-resolution models of the native conformation. In the final population of these simulations, the energetically lowest conformation had approached the native state to 4.5 Å and 4.3 Å backbone root-mean-square deviation (RMSB), respectively (see Fig. 1). We find that six and five of the energetically lowest 10 conformations converge to near-native conformations within the constraints of the algorithm. The "native content" of the simulated ensemble, calculated as a suitably defined weighted average of the RMSB deviations of the population, increased as much as 60-fold during the simulations (see Fig. 2). These results demonstrate the feasibility of de novo protein structure prediction for a four-helix protein and transferability of our force field to this previously inaccessible structure class. As for the two- and three-helix proteins, we find that the entire low-energy landscape is dominated by conformations with nearly native secondary structure.
|
|
| METHODS |
|---|
|
|
|---|
![]() | (1) |
During the folding process, we consider only variations of the dihedral angles {
i} of the backbone and the side chains, keeping all other angles and bond-lengths fixed. In our simulation, we therefore propose only moves around the side-chain and backbone dihedral angles, which are attempted with 30% and 70% probability, respectively. The moves for the side-chain angles are drawn from an equidistributed interval with a maximal change of 5°. Half of the backbone moves are generated in the same fashion, and the remainder are generated from a move library that was designed to reflect the natural amino-acid-dependent bias toward the formation of
-helices or ß-sheets. The probability distribution of the move library was fitted to experimental probabilities observed in the PDB database (34
). Although it drives the simulation toward the formation of secondary structure, the move library contains no bias toward helical or sheet structures beyond that encountered in nature.
Optimization methods
The low-energy region of the free-energy landscape of proteins is extremely rugged because the packing of atoms in collapsed conformations is quite dense. Efficient optimization methods must speed up the simulation by avoiding high-energy transition states, adapt large-scale moves, or accept unphysical intermediates. The basin-hopping technique has proved to be a reliable workhorse for many complex optimization problems (35
), including protein folding (9
,36
38
), but it employs only one dynamical process.
Here we generalize the basin-hopping method to a population of size N, which is iteratively improved by P concurrent dynamical processes (we used P = 50100). The whole population is guided toward the optimum of the free-energy surface with a simple evolutionary strategy. This strategy must balance energy improvement and diversity of the population. Conformations are drawn from the population and subjected to an annealing cycle. At the end of each cycle the resulting conformation is either integrated into the active population or discarded. Similar strategies, employing a conformation stack, were explored in simulations of the 23-amino-acid BBA5 protein (32
,36
).
This algorithm was implemented on a distributed master-client model in which idle clients request a task from the master. The master maintains the active conformations of the population and distributes the work to the clients. Each step in the evolutionary algorithm has three phases:
Selection
In this step, a conformation is drawn randomly from the active population. We have used two different probability distributions for the simulations of bacterial ribosomal protein L20. Simulation A used a uniform distribution. In simulation B, the selection probability fell linearly with the energetic rank of the conformation, the energetically best conformation was N times as likely to be chosen as the worst replica.
Annealing cycle
We used a geometric cooling schedule with Tstart = 600 K, Tend = 2 K. The number of steps per cycle increased as
where Ncycle is the total number of cycles of all simulations (maintained by the master process). Toward the end of the simulation, a single annealing cycle comprised as much as 2.3 x 106 steps.
Population update
The acceptance criterion for newly generated conformations must balance the diversity of the population against the enrichment with low-energy decoys.
The new conformation replaces a member of the active population in the following cases:
New conformations that do not fit these selection rules are discarded. The decision tree for this process is shown in detail (see Fig. 6). Two conformations are considered similar if their mutual RMSB is below RC. We used RC = 3 Å. The selection rules insure the diversity of the population, whereas the replacement criteria insure that structurally different low-energy conformations are always accepted.
|
We performed these simulations with a 20% reduced strength of the solvent interactions (VS) to facilitate the rapid formation of secondary structure. It has been argued that hydrophobic collapse competes with secondary-structure formation in protein folding. In the collapsed conformational ensemble, large-scale conformational changes, such as those required for secondary-structure formation, occur only rarely.
The decoy set was ranked according to total energy as well as the individual energy terms (VS, VLJ, VHB, and VC (side chain) and VC (backbone)). For each criterion, we selected the best 50 conformations and eliminated duplicates to arrive at a population of 266 distinct conformations (by the criterion defined above), which seeded the population for the evolutionary algorithm.
Performance measure: native score
To judge the performance of the algorithm it is important to note that it is not possible for the entire population to converge to the native conformation. We measured the progress of the simulation by monitoring the quality of the lowest conformation and the number and rank of near-native conformations (within a threshold 3 Å RMSB with respect to the native conformation). To quantify the latter we defined a native score
of the population. The sum runs over all near-native conformations in the population. N is the size of the population and R designates the rank of the conformation in the population. A score of 100 corresponds to a native decoy placed at the top position, whereas a near-native decoy at the bottom of the population contributes nothing to the score.
| RESULTS |
|---|
|
|
|---|
the best conformations are only drawn relatively seldom. An equidistributed selection probability concentrates much effort on comparatively high-energy conformations. To overcome this difficulty, we pruned the simulation to the best N = 50 decoys (by energy) and continued simulation A for another 5500 annealing cycles. Simulation B was started from the same subpopulation, but the computational effort was further biased toward the improvement of the "best" structures using a selection probability that fell linearly with the energetic rank of the conformation.
At the end of the simulations, the respective lowest-energy conformations had converged to 4.6 and 4.3 Å RMSB with respect to the native conformation. Simulation B had reached a slightly lower energy than simulation A. Table 1 demonstrates that of the 10 lowest structures in each simulation, 5 and 6, respectively, had independently converged to near-native conformations of the protein. The first nonnative decoy appears in position 2, with an energy deviation of only 1.8 kcal/mol (in our model) and a significant RMSB deviation.
|
60% (75%) coincidence of the Cß-Cß distances to within 1 (1.5) standard deviation of the experimental resolution for both simulations. The dark diagonal blocks indicate intrahelical contacts. These are, perhaps not too surprisingly, resolved to very good accuracy. The off-diagonal dark blocks indicate the formation of a large fraction of correct long-range native contacts.
|
|
|
Convergence
Fig. 2 demonstrates the convergence of both the energy and the average RMSB deviation as the function of the number of total iterations (basin-hopping cycles). Both simulations had an acceptance ratio of
30%. The simulation using equidistributed probabilities converged faster regarding the overall energy, but not regarding the average RMSB of the population. Both simulations smoothly approach the native conformation, as is also indicated by the plot of the native score (see Methods), which increases over 60-fold in the course of these simulations. Fig. 3 traces the development of energy and RMSB deviation of the best decoy in the final population of simulation B. Most of the helical structure forms early in the simulations. Because our dynamics is artificial, this does not necessarily imply early helix formation in the physical folding process. We note that there is a sudden rapid drop in RMSB deviation to the native conformation, which is accompanied by a rapid drop in energy.
Method comparison
To date, we have failed to make significant progress for de novo folding of the bacterial ribosomal protein L20 with other stochastic optimization methods. To rationalize the success of the evolutionary algorithm, we compare simulations with the evolutionary algorithm and the basin-hopping technique for the Trp-cage protein. In a recent comparison of stochastic optimization methods (24
), our version of the basin-hopping technique emerged as the most efficient technique for folding the Trp-cage protein. We first performed 20 independent basin-hopping simulations with 100 cycles per replica. We then ran an evolutionary algorithm using a population size of N = 20 and P = 50 processors that used the same total computational effort. The evolutionary algorithm used a uniform distribution for conformational selection (as simulation A for bacterial ribosomal protein L20). Both methods used exactly the same parameterization of the annealing cycle. Both simulations folded the protein, although the evolutionary algorithm obtained a marginally better energy at the end. The best and mean energies of both "populations" are shown in Fig. 7 as a function of the numerical effort.
|
; its best energy thus trails the basin-hopping method. This observation explains the difficulties of the evolutionary algorithm right after starting. The long-term superiority of the evolutionary technique becomes obvious when comparing the average energies of the population: The average energy of the evolutionary algorithm is much lower than for the basin-hopping method. In basin-hopping a certain fraction of the simulations go completely astray: they never find low-energy or near-native conformations. The selection process in the evolutionary algorithm efficiently eliminates such conformations from the active population, generating nonlocal, large-step moves that lead to improved overall performance.
| DISCUSSION |
|---|
|
|
|---|
|
The free-energy approach exploits the 30-year-old thermodynamic hypothesis (14
), according to which the native structure of many proteins can be predicted using stochastic optimization methods. Our results demonstrate that the important influence of the solvent can be modeled with a relatively simple solvent-accessible surface approach, at least for some proteins. The stochastic exploration of the free-energy surface is much faster than direct simulation, because nonphysical moves can be attempted and nonphysical intermediates tolerated. Its natural drawback is the loss of kinetic and thermodynamic information. Recent years have seen the emergence of computational methods to explore the native conformation and the transition-state ensemble with great kinetic detail (13
,41
). Free-energy optimization methods and replica-exchange explicit water molecular dynamics thus offer complementary views of the protein folding process.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This work was supported by the Deutsche Forschungsgemeinschaft, the Bode Foundation, and the Bundesministerium für Wissenschaft und Forschung. We acknowledge support of the Supercomputational Materials Laboratory of Korea Institute of Science and Technology (Seoul, South Korea), where some of these simulations were performed.
Submitted on July 11, 2005; accepted for publication January 10, 2006.
| REFERENCES |
|---|
|
|
|---|
2. Schonbrunn, J., W. J. Wedemeyer, and D. Baker. 2002. Protein structure prediction in 2002. Curr. Opin. Struct. Biol. 12:348352.[CrossRef][Medline]
3. Liwo, A., P. Arlukowicz, C. Czaplewski, S. Oldizeij, J. Pillardy, and H. Scheraga. 2002. A method for optimising potential energy functions by a hierarchichal design of the potential energy landscape. Proc. Natl. Acad. Sci. U.S.A. 99:19371942.
4. Moult, J., K. Fidelis, A. Zemia, and T. Hubbard. 2001. Critical assessment of methods of protein structure (CASP): round IV. Proteins. 45:27.[Medline]
5. Snow, C. D., H. Nguyen, V. S. Pande, and M. Gruebele. 2002. Absolute comparison of simulated and experimental protein folding dynamics. Nature. 420:102106.[CrossRef][Medline]
6. Simmerling, C., B. Strockbine, and A. Roitberg. 2002. All-atom strucutre prediction and folding simulations of a stable protein. J. Am. Chem. Soc. 124:1125811259.[CrossRef][Medline]
7. Schug, A., T. Herges, and W. Wenzel. 2003. Reproducible protein folding with the stochastic tunneling method. Phys. Rev. Lett. 91:158102.[CrossRef][Medline]
8. Vila, J., D. Ripoll, and H. Scheraga. 2004. Atomically detailed folding simulation of the B domain of staphylococcal protein A from random structures. Proc. Natl. Acad. Sci. U.S.A. 100:1481214816.
9. Herges, T., and W. Wenzel. 2005. Reproducible in-silico folding of a three-helix protein and characterization of its free energy landscape in a transferable all-atom forcefield. Phys. Rev. Lett. 94:018101.[CrossRef][Medline]
10. Hansmann, U. H. E. 2002. Global optimization by energy landscape paving. Phys. Rev. Lett. 88:068105.[CrossRef][Medline]
11. Lin, C., C. Hu, and U. Hansmann. 2003. Parallel tempering simulations of hp-36. Proteins. 52:436445.[CrossRef][Medline]
12. Duan, Y., and P. A. Kollman. 1998. Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 282:740744.
13. Garcia, A. E., and N. Onuchic. 2003. Folding a protein in a computer: an atomic description of the folding/unfolding of protein A. Proc. Natl. Acad. Sci. U.S.A. 100:1389813903.
14. Anfinsen, C. B. 1973. Principles that govern the folding of protein chains. Science. 181:223230.
15. Onuchic, J. N., Z. Luthey-Schulten, and P. Wolynes. 1997a. Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem. 48:545600.[CrossRef][Medline]
16. Hardin, C., M. Eastwood, M. Prentiss, Z. Luthey-Schulten, and P. Wolynes. 2003. Folding funnels: the key to robust protein structure prediction. J. Comput. Chem. 23:138146.[CrossRef]
17. Onuchic, J. N., Z. Luthey-Schulten, and P. G. Wolynes. 1997b. Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem. 48:545600.[CrossRef][Medline]
18. Dill, K., and H. Chan. 1997. From Levinthal to pathways to funnels: The "new view" of protein folding kinetics. Nat. Struct. Biol. 4:1019.[Medline]
19. Levinthal, C. 1968. Are there pathways for protein folding? J. Chem. Phys. 65:4445.
20. Herges, T., A. Schug, H. Merlitz, and W. Wenzel. 2003. Stochastic optimization methods for structure prediction of biomolecular nanoscale systems. Nanotechnology. 14:11611167.[CrossRef]
21. Herges, T., and W. Wenzel. 2004. An all-atom force field for tertiary structure prediction of helical proteins. Biophys. J. 87:31003109.
22. Schug, A., and W. Wenzel. 2004. All-atom folding of the Trp-cage protein in an all-atom forcefield. Europhys. Lett. 67:307313.[CrossRef]
23. Schug, A., W. Wenzel, and U. Hansmann. 2005. Energy landscape paving simulations of the Trp-cage protein. J. Chem. Phys. 122:194711.[CrossRef][Medline]
24. Verma, A., A. Schug, K. H. Lee, and W. Wenzel. 2006. Basin hopping simulations for all-atom protein folding. J. Chem. Phys. 124:044515.[CrossRef][Medline]
25. Herges, T., and W. Wenzel. 2005b. Free energy landscape of the villin headpiece in an all-atom forcefield. Structure. 13:661668.[Medline]
26. Schug, A., T. Herges, and W. Wenzel. 2004. All-atom folding of the three-helix HIV accessory protein with an adaptive parallel tempering method. Proteins. 57:792798.[CrossRef][Medline]
27. Kirkpatrick, S., C. Gelatt, and M. Vecchi. 1983. Optimization by simulated annealing. Science. 220:671680.
28. Raibaud, S., I. Lebars, M. Guillier, C. Chiaruttini, F. Bontems, A. Rak, M. Garber, F. Allemand, M. Springer, and F. Dardel. 2002. NMR structure of bacterial ribosomal protein l20: implications for ribosome assembly and translational control. J. Mol. Biol. 323:143151.[CrossRef][Medline]
29. Schug, A., and W. Wenzel. 2004b. Predictive in-silico all-atom folding of a four helix protein with a free-energy model. J. Am. Chem. Soc. 126:1673616737.[CrossRef][Medline]
30. Withers-Ward, E. S., T. Mueller, I. Chen, and J. Feigon. 2000. Biochemical and structural analysis of the interaction between the UBA(2) domain of the DNA repair protein HHR23A and HIV-1 Vpr. Biochemistry. 39:1410314112.[CrossRef][Medline]
31. Herges, T., A. Schug, and W. Wenzel. 2004. Exploration of the free energy surface of a three helix peptide with stochastic optimization methods. Int. J. Quantum Chem. 99:854893.[CrossRef]
32. Abagyan, R. A., and M. Totrov. 1994. Biased probability Monte Carlo conformation searches and electrostatic calculations for peptides and proteins. J. Mol. Biol. 235:9831002.[CrossRef][Medline]
33. Avbelj, F., and J. Moult. 1995. Role of electrostatic screening in determining protein main chain conformational preferences. Biochemistry. 34:755764.[CrossRef][Medline]
34. Pedersen, J. T., and J. Moult. 1997. Protein folding simulations with genetic algorithms and a detailed molecular description. J. Mol. Biol. 269:240259.[CrossRef][Medline]
35. Wales, D. J., and J. P. Doye. 1997. Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J. Phys. Chem. 101:51115116.
36. Abagyan, R. A., and M. Totrov. 1999. Ab initio folding of peptides by the optimal bias Monte Carlo minimization procedure. J. Comp. Phys. 151:402421.[CrossRef]
37. Mortenson, P. N., and D. J. Wales. 2004. Energy landscapes, global optimization and dynamics of poly-alanine Ac(ala)8NHMe. J. Chem. Phys. 114:64436454.[CrossRef]
38. Mortenson, P. N., D. A. Evans, and D. J. Wales. 2002. Energy landscapes of model polyalanines. J. Chem. Phys. 117:13631376.[CrossRef]
39. Levitt, M., and C. Chothia. 1976. Structural patterns in globular proteins. Nature. 261:552558.[CrossRef][Medline]
40. Garcia, A., and J. Onuchic. 2005. Folding a protein on a computer: hope or reality. Structure. 13:497503.[Medline]
41. Mayor, U., N. R. Guydosh, C. M. Johnson, J. G. Grossmann, S. Sato, G. S. Jas, S. M. V. Freund, D. O. V. Alonso, V. Daggett, and A. R. Fersht. 2003. The complete folding pathway of a protein from nanoseconds to microseconds. Nature. 421:863867.[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |