| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |




* Department of Chemistry,
Digital Life Laboratory;
Division of Chemistry and Chemical Engineering; and
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California 91125
Correspondence: Address reprint requests to Jesse D. Bloom, California Institute of Technology, 210-41, Pasadena, CA 91125. Tel.: 626-354-2565; Fax: 626-568-8743; E-mail: bloom{at}caltech.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Although natural proteins possess stable native structures, the evolutionary fitness of a protein depends not on the stability of the native structure per se, but rather on the stability of this structure being appropriate to allow the protein to perform a function such as catalyzing a chemical reaction or binding to a ligand. Stability is therefore under selection only insofar as it is necessary for biochemical function, and most natural proteins are only marginally stable at their physiologically relevant temperatures (Fersht, 2002
).
In protein mutagenesis studies, stability and function can appear to be competing properties, with mutations that increase stability often reducing function (Shoichet et al., 1995
; Schreiber et al., 1994
), and mutations that improve or alter function often decreasing stability (Wang et al., 2002
). However, several lines of evidence demonstrate that high stability and high functionality are not inherently incompatible. In nature, there is a strong correlation between the temperature of an organism's environment and the stability of its proteins, indicating that natural evolution is able to create functional and highly stable proteins if there is sufficiently strong selection pressure (Somero, 1995
; Rees and Adams, 1995
).
In the laboratory, protein engineers have also demonstrated that natural proteins are not maximally stable by using directed evolution to find mutations that make proteins more stable without sacrificing enzymatic function (Giver et al., 1998
; Arnold, 1998
; Serrano et al., 1993
; Arnold et al., 2001
). These results show that high functionality and high stability can coexist, suggesting that the marginal stabilities of natural proteins are due primarily to the simple fact that highly stable sequences are rare (Taverna and Goldstein, 2002
), and therefore that most mutations to an evolved protein will decrease its stability. For this reason, proteins will tend to be no more stable than is required by their environment, because any extra stability that confers no further selective advantage will be eliminated by mutations.
Comprehensive experimental examinations of protein evolution are limited by the vast number of possible sequences and the difficulties in rapidly assaying protein properties. However, simple protein models originally developed to study protein folding (Dill et al., 1995
; Hinds and Levitt, 1994
; Shakhnovich and Gutin, 1993
; Socci et al., 1998
) provide a useful tool for studying protein evolutionary dynamics (Chan and Bornberg-Bauer, 2002
). Although these models are gross oversimplifications of real proteins, their tractability allows for a far more extensive exploration of sequence space than can be done experimentally. Previous studies using model proteins have focused on the evolution of stable structures (Xia and Levitt, 2002
; Cui et al., 2002
; Bastolla et al., 2000
; Taverna and Goldstein, 2000
; Tiana et al., 2000
; Bornberg-Bauer and Chan, 1999
) or fast-folding (Gutin et al., 1995
; Mirny et al., 1998
) proteins, whereas with few exceptions (Williams et al., 2001
; Hirst, 1999
) the interplay between the evolution of stability and function has gone unexamined. Here we use a model protein to investigate how selection for stability affects the evolution of function. In our model, we describe the function of a protein as its ability to bind to a rigid ligand molecule. The fitness of a protein depends on its ability to perform its function of binding to a ligand, which in turn depends on its ability to fold to a native structure with some minimal stability. We can increase the minimal stability requirement by increasing the temperature parameter, allowing us to explore the relationship between stability and the evolvability of function.
| METHODS |
|---|
|
|
|---|
The monomers can be of 20 types, corresponding to the 20 amino acids. Each monomer on the lattice has four nearest neighbor sites, of which as many as two can be occupied by nonbonded neighboring residues (three in the case of terminal residues). The energy E(
) of a protein conformation
is the sum of the nearest-neighbor interactions of nonbonded residues,
![]() |
equals one if residues i and j are nearest neighbors in conformation
and zero otherwise, and
is the interaction energy between residue types
and
The interaction energies
are based on a widely used statistical analysis of real proteins by Miyazawa and Jernigan (1985)
Folding the proteins
The native structure and stability of the protein can be determined by finding the lowest energy conformation,
and the partition function. Computation of the partition function requires defining a temperature parameter T. This temperature parameter represents the thermodynamic temperature, however, because the model protein interaction energies are independent of temperature, the temperature parameter does not capture behaviors of real proteins that are caused by the temperature dependence of the interaction energies (for example, cold denaturation). To avoid confusion, we refer to T as the temperature parameter rather than as the temperature.
The partition function at a temperature parameter of T is:
![]() |
The free energy of folding
to
is then the difference between
and the free energy of the ensemble of all other conformations,
![]() |
The fraction of proteins
that are expected to be folded to
at equilibrium is given by
![]() |
Exact calculation of
requires enumeration of all 5.81 x 106 unique conformations corresponding to all of the self-avoiding walks that are not related by symmetry (Rapaport, 1987
). Many of these walks have very few contacts, and so make only a small contribution to the partition function. We only explicitly considered the 7.95 x 105 conformations with more than four contacts. The remaining 5.01 x 106 conformations were treated by a crude mean-field model, estimating the partition sum contribution of all conformations with n contacts (0
n
4) as
![]() |


is the average residue-residue contact energy for the given protein sequence assuming any residue is equally likely to be in contact with any other nonadjacent residue,
is the variance in the residue-residue contact energy, and
is the number of conformations with n contacts. This approximation introduces only a very small errora test of 103 random sequences at T = 1.0 showed that the root-mean-square error and maximum differences between the approximate and exact values of
were 1.6 x 104 and 2.8 x 103, respectively. This error had no effect on the evolutionary trajectories, because running a sample trajectory with and without the approximation led to identical results. Folding a protein took roughly 0.3 s on a 2-GHz processor.
Modeling the protein function
We introduce the concept of function by considering the binding of a ligand to the protein, an idea that to our knowledge was first introduced by Miller and Dill (1997)
. We define the function of a protein as its ability to bind to a rigid ligand when the protein is in its lowest energy conformation
The binding energy
is the interaction energy between the folded protein and the rigid ligand in the lowest energy binding position, found by searching all translational and rotational positions of the ligand relative to the protein. Fig. 1 shows the ligand used in the simulations bound to a protein in its lowest energy conformation.
|
of a protein at a temperature parameter of T is defined as the negative product of the fraction of the proteins that are folded f(T) times the binding energy
of the folded protein to the ligand, so that
![]() |
Because
has a sigmoidal dependence on
the fitness of an unstructured protein is essentially zero, and a protein gains fitness as it achieves some minimal stability determined by the temperature parameter. Once a protein has achieved the minimal stability, it can only substantially improve its fitness by improving its ligand binding function. The stringency of the stability requirement depends on the temperature parameter T at which the fitness is computed, with higher temperature parameters favoring greater stabilities.
Strictly speaking, the fraction of proteins bound to the ligand also displays a sigmoidal dependence on the product of the ligand binding energy and the fraction of folded proteins. However, because we are only interested in differences in fitnesses rather than their magnitudes, any function that monotonically increases with this product will give the same results, and so we choose the simpler functional form defined above.
Evolving the proteins
Each evolutionary replicate began with a population of 99 random sequences. At each generation, the 33 most-fit sequences were selected, and each was used to generate two identical offspring. Random point mutations were made in all 99 resulting proteins with a per-site mutation rate of 3.3 x 102, which corresponds to a per-protein mutation rate of 0.6. The mutated proteins were then refolded, and their fitnesses were calculated.
For the evolutionary trajectories that began with evolved proteins, we first evolved random populations for 250 generations to bind to each of the ligands shown in Fig. 6. The best binding sequences for ligands one, two, and three were FFKFKKFKIFMLKWMKMF, FMGFMIIFFLKFKKFGWF, and MFHVFCHFEWPKPMKCFM, respectively. These sequences were then used for the initial identical populations of 99 proteins for the evolutionary runs, which were otherwise carried out as before.
|
| RESULTS |
|---|
|
|
|---|
|
Proteins evolve ligand binding function more efficiently at lower stability
To determine how selection for stability affected the evolution of ligand binding function, we examined the binding energies achieved after 500 generations of evolution at all five temperature parameters. Fig. 3 shows the distribution of binding energies for all cases. At least a few replicates evolved strong binding proteins at all of the temperature parameters. However, the frequency of evolution of strong binders was much higher at lower temperature parameters, whereas at higher temperature parameters many of the evolutionary trajectories became stuck at weak binding proteins. The binding energy distributions are statistically different for temperature parameters that varied by >0.2 with confidences of >0.95 (Kolmogorov-Smirnov test, D and P values for comparison of T = 0.8 and T = 1.0, T = 0.8 and 1.1, T = 0.8 and 1.2, T = 0.9 and T = 1.1, T = 0.9 and 1.2, and T = 1.0 and T = 1.2 are 0.64 and 7.8 x 1010, 0.58 and 3.7 x 108, 0.68 and 4.8 x 1011, 0.34 and 4.4 x 102, 0.60 and 1.08 x 108, and 0.52 and 1.2 x 106, respectively; Press et al., 2002
).
|
|
|
|
All of the trends we found by beginning with random populations were preserved when we started from these evolved proteins. We again found that evolutionary trajectories at lower temperature parameters yielded stronger binding final proteins, and that the gradient was more effective at evolving high-temperature fitness than a constant high temperature parameter (Fig. 7). The different initial populations do lead to different final binding energies and fitnesses, with random populations tending to lead to better values. However, the trends of lower temperature parameter leading to stronger binding and a gradient approach leading to higher fitness hold for all four initial populations.
|
| DISCUSSION |
|---|
|
|
|---|
We present a strategy to overcome this problem of the evolutionary trajectories becoming trapped at high stability but weak binding proteins. Performing the initial rounds of evolution at a low temperature parameter decreases the selection for stability, and so allows the proteins to more easily find strong binding regions of sequence space. The temperature parameter can then be increased, which leads to the selection of more stable sequences. Our results indicate that this approach is more effective for evolving highly stable and strong binding proteins than constant selection for both high stability and strong binding. This strategy takes advantage of the fact that it is easier to maintain strong ligand binding while improving stability than to maintain high stability while improving binding.
Our results fit into the framework of current theories about the distributions of proteins in sequence space that has emerged from other lattice protein studies. These studies have shown that protein structures are coded for by structurally neutral networks spanning many diverse sequences, and that these networks are structured as superfunnels, with the most stable sequences also possessing the most connections in the networks (Bornberg-Bauer and Chan, 1999
; Broglia et al., 1999
; Bastolla et al., 2000
, 1999
). Our work suggests that a protein evolves function most effectively when it can freely explore in its structurally neutral network, rather then when it is trapped in a small number of highly stable sequences. Our initial relaxation of the stability requirement facilitates exploration of the structurally neutral network, and once highly functional sequences are found, they can be optimized for stability. Although we do not consider recombination in our current study, other work (Cui et al., 2002
; Xia and Levitt, 2002
) has shown that whereas structurally neutral networks can easily be explored locally by point mutations, moves between networks or to distant regions of the same network are facilitated by crossover-induced sequence space jumps. Therefore, we suggest that addition of recombination to our evolutionary protocol may further assist in the evolution of function.
The evolution of our model proteins also has strong parallels with real protein evolution. As with real proteins, our model proteins evolve primarily by structurally conservative mutations that tinker with the contacts in a preserved structural scaffold, rather than by mutations that cause wholesale structural changes. The interplay between the evolution of stability and function in our model is also reminiscent of real protein evolution; for example, in the evolution of new function in TEM-1 ß-lactamase, gains in function were correlated with drops in stability, followed by gradual regaining of the lost stability (Wang et al., 2002
).
Our model points to general trends that are important in both natural and experimental protein evolution, where different structural and functional properties are under different selection pressures. Protein evolution involves concurrent selection for stability and function, and productive mutations must improve one of these properties without excessively damaging the other. Because most mutations to evolved proteins will be deleterious to at least one of these properties, strong selection for both stability and function will limit the number of productive mutations, and so lead to trapping at local fitness optima. Protein evolution therefore occurs most efficiently when the temporary drops in stability associated with gains in function are buffered by mild selection for stability.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
J.D.B. is supported by a Howard Hughes Medical Institute predoctoral fellowship. C.O.W. and C.A. were supported by the National Science Foundation under contract No. DEB-9981397. Part of this work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
Submitted on November 7, 2003; accepted for publication January 12, 2004.
| REFERENCES |
|---|
|
|
|---|
Arnold, F. H. 1998. Enzyme engineering reaches the boiling point. Proc. Natl. Acad. Sci. USA. 95:20352036.
Arnold, F. H., P. L. Wintrode, K. Miyaszki, and A. Gershenson. 2001. How enzymes adapt: lessons from directed evolution. Trends Biochem. Sci. 25:100106.
Bastolla, U., H. E. Roman, and M. Vendruscolo. 1999. Neutral evolution of model proteins: diffusion in sequence space and overdispersion. J. Theor. Biol. 200:4964.[Medline]
Bastolla, U., M. Vendruscolo, and H. E. Roman. 2000. Structurally constrained protein evolution: results from a lattice simulation. Eur. Phys. J. B. 15:385397.
Bornberg-Bauer, E., and H. S. Chan. 1999. Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc. Natl. Acad. Sci. USA. 96:1068910694.
Broglia, R., G. Tiana, H. Roman, E. Vigezzi, and E. Shakhnovich. 1999. Stability of designed proteins against mutations. Phys. Rev. Lett. 82:47274730.
Chan, H. S., and E. Bornberg-Bauer. 2002. Perspectives on protein evolution from simple exact models. Applied Bioinformatics. 1:121144.[Medline]
Cui, Y., W. H. Wong, E. Bornberg-Bauer, and H. S. Chan. 2002. Recombinatoric exploration of novel folded structures: a heteropolymer-based model of protein evolutionary landscapes. Proc. Natl. Acad. Sci. USA. 99:809814.
Davidson, A. R., K. J. Lumb, and R. T. Sauer. 1995. Cooperatively folded proteins in random sequence libraries. Nat. Struct. Biol. 2:856864.[Medline]
Dill, K. A., S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas, and H. S. Chan. 1995. Principles of protein folding: a perspective from simple exact models. Protein Sci. 4:561602.[Abstract]
Fersht, A. R. 2002. Structure and Mechanism in Protein Science. W. H. Freeman and Company, New York.
Giver, L., A. Gershenson, P. O. Freskgard, and F. H. Arnold. 1998. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA. 95:1280912813.
Gutin, A. M., V. I. Abkevich, and E. I. Shakhnovich. 1995. Evolution-like selection of fast-folding model proteins. Proc. Natl. Acad. Sci. USA. 92:12811286.
Hinds, D. A., and M. Levitt. 1994. Exploring conformational space with a simple lattice model for protein-structure. J. Mol. Biol. 243:668682.[Medline]
Hirst, J. D. 1999. The evolutionary landscape of functional model proteins. Protein Eng. 12721726.
Keefe, A. D., and J. W. Szostak. 2001. Functional proteins from a random-sequence library. Nature. 410:715718.[Medline]
Koehl, P., and M. Levitt. 2002. Protein topology and stability define the space of allowed sequences. Proc. Natl. Acad. Sci. USA. 993:12801285.
Miller, D. W., and K. A. Dill. 1997. Ligand binding to proteins: the binding landscape model. Protein Sci. 6:21662179.[Abstract]
Mirny, L. A., V. I. Abkevich, and E. I. Shakhnovich. 1998. How evolution makes proteins fold quickly. Proc. Natl. Acad. Sci. USA. 95:49764981.
Miyazawa, S., and R. L. Jernigan. 1985. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules. 18:534552.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, editors 2002. Are two distributions different? In Numerical Recipes in C. Cambridge University Press, Cambridge, UK. 620628.
Rapaport, D. C. 1987. Algorithms for lattice statistics. Comput. Phys. Rep. 5:265350.
Rees, D. C., and M. W. W. Adams. 1995. Hyperthermophiles: taking the heat and loving it. Structure. 3:251254.[Medline]
Schreiber, G., A. M. Buckle, and A. R. Fersht. 1994. Stability and function: two constraints in the evolution of barstar and other proteins. Structure. 2:945951.[Medline]
Serrano, L., A. G. Day, and A. R. Fersht. 1993. Step-wise mutation of barnase to binase: a procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J. Mol. Biol. 233:305312.[Medline]
Shakhnovich, E. I. 1998. Protein design: a perspective from simple tractable models. Fold. Des. 3:R45R58.[Medline]
Shakhnovich, E. I., and A. M. Gutin. 1993. Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. USA. 90:71957199.
Shoichet, B. K., W. A. Baase, R. Kuroki, and B. W. Matthews. 1995. A relationship between protein stability and protein function. Proc. Natl. Acad. Sci. USA. 92:452456.
Socci, N. D., J. N. Onuchic, and P. G. Wolynes. 1998. Protein folding mechanisms and the multidimensional folding funnel. Proteins. 32:136158.[Medline]
Somero, G. N. 1995. Proteins and temperature. Annu. Rev. Physiol. 57:4368.[Medline]
Taverna, D. M., and R. A. Goldstein. 2000. The distribution of structures in evolving protein populations. Biopolymers. 53:18.[Medline]
Taverna, D. M., and R. A. Goldstein. 2002. Why are proteins marginally stable? Proteins. 46:105109.[Medline]
Tiana, G., R. Broglia, and E. Shakhnovich. 2000. Hiking in the energy landscape in sequence space: a bumpy road to good folders. Proteins. 39:244251.[Medline]
Wang, X., G. Minasov, and B. K. Shoichet. 2002. Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J. Mol. Biol. 320:8595.[Medline]
Williams, P. D., D. D. Pollock, and R. A. Goldstein. 2001. Evolution of functionality in lattice proteins. J. Mol. Graph. Model. 19:150156.[Medline]
Xia, Y., and M. Levitt. 2002. Roles of mutation and recombination in the evolution of protein thermodynamics. Proc. Natl. Acad. Sci. USA. 16:1038210387.
This article has been cited by other articles:
![]() |
V. K. Dubey, J. Lee, and M. Blaber Redesigning symmetry-related "mini-core" regions of FGF-1 to increase primary structure symmetry: Thermodynamic and functional consequences of structural symmetry Protein Sci., September 1, 2005; 14(9): 2315 - 2323. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Wroe, E. Bornberg-Bauer, and H. S. Chan Comparing Folding Codes in Simple Heteropolymer Models of Protein Evolutionary Landscape: Robustness of the Superfunnel Paradigm Biophys. J., January 1, 2005; 88(1): 118 - 131. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |