| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
,

* Institute of Applied Mathematics and Mechanics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland;
Laboratory of Theory of Biopolymers, Faculty of Chemistry, Warsaw University, Pasteura 1, 02-093 Warsaw, Poland; and
Donald Danforth Plant Science Center, Bioinformatics and Computational Genomics, 975 N. Warson Rd., Saint Louis, Missouri 63141 USA
Correspondence: Address reprint requests to Andrzej Kolinski, E-mail: Kolinski{at}chem.uw.edu.pl or Skolnick{at}danforthcenter.org.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The protein model we adopt is a face-centered cubic lattice chain, with the chain beads representing the polypeptide amino acid units. Each amino acid residue is characterized by two fundamental properties: its hydrophobicity (that dictates the character of the binary interactions) and its secondary structure propensity (that encodes the tendency to adopt a specific rotational-isomeric state of a chain fragment). As demonstrated in many earlier studies, such an interplay between the short- and long-range interactions leads to cooperative collapse transitions in a finite length polymer (Kolinski and Skolnick, 1996
; Kolinski et al., 1986
; Kolinski et al., 1996
; Post and Zimm, 1979
). Here, for the first time, we provide quantitative arguments that the existence of both types of interactions is actually a necessary condition for protein-like behavior.
| PROTEIN MODEL |
|---|
|
|
|---|
Representation of protein conformation
The model polypeptide is restricted to a face-centered cubic lattice (fcc). There are 12 orientations of the fcc vectors, which form a BASE, base set, of the lattice. This set could be written as:
![]() | (1) |
The fcc lattice, FCC, may be defined by induction as follows:
![]() | (2a) |
![]() | (2b) |
Points x1, x2
FCC are neighbors on the lattice if there exists e
BASE, such that x2 = x1 + e. We write in this case: x1
x2.
Let CHAIN = {1,...,N} be a set of residues in a polypeptide chain. A structure of a polypeptide is represented on the lattice by a function s: CHAIN
FCC, which satisfies the following three conditions:
![]() | (3a) |
![]() | (3b) |
![]() | (3c) |
We will identify a structure with its representation on the lattice, and we denote by S the set of all structures. S will be called the conformational space. It is easily seen that
![]() | (4) |
Representation of the polypeptide sequence
A sequence of the chain is defined by its hydrophobic pattern Pat: CHAIN
{H,P} and its secondary structure Sec:CHAIN
{ß,C}. This means that from the point of view of the long-range pairwise interactions, there are two types of residues (Dill et al., 1995
): nonpolar, hydrophobic (H) and polar (P). Moreover, on the level of secondary structure, or chain stiffness, ß stands for extended, ß-type short-range interactions, and C denotes the flexible coil, or loop, regions. Thus, the model employs a four-letter sequence code.
Interaction scheme
The definition of a model polypeptide sequence implies two main types of molecular interactions. First, the long-range interactions depend on the number of contacts between residues. Let vi(s) be a vector from s(i) to s(i + 1). We will write it simply as vi. A pair of vectors, (vi-1,vi) and (vj-1,vj), are called parallel (notation: vi-1,vi || vj-1,vj) if either vi-1 = vj-1 and vi = vj or vi-1 = -vj and vi = -vj-1. For a given structure s, we define functions counting three types of long-range contacts between residues:
![]() | (5a) |
![]() | (5b) |
![]() | (5c) |
Note that PP interactions are counted only for the residues contacting in a parallel fashion, reflecting the tendency of the parallel packing of polar side chains on the surface of a protein (Ilkowski et al., 2000
).
The short-range interactions simulate the local conformational stiffness of the polypeptide chains. Here, for illustration, we limited ourselves to the case of ß-type proteins. Let us denote by x · y the dot product of vectors x, y. The number of residues with preferences to be in ß-strands is defined as follows:
![]() | (6) |
The geometric conditions mean that a given three-bond fragment has the most expanded conformation with its planar angles equal to 120°.
Let K(s) = (KHH(s),KHP(s),KPP(s),Kß(s)) be a vector defining the numbers of various interactions and
= (
HH,
HP,
PP,
ß) be a vector of weights, or the force-field parameters. The conformational energy of a structure s is, by definition, a linear combination of its contacts:
![]() | (7) |
Recently, we have shown that this model exhibits a highly cooperative all-or-none collapse transition (Gront et al., 2001
) into a three-dimensional structure of unique Greek-key topology (Branden and Tooze, 1991
). Here, we would like to show that a very similar model is indeed minimal, i.e., that the design of the force field is not accidental and that one needs nonzero values of all the proposed interactions to obtain a protein-like folding transition. The same, quite complex topology of the native state is assumed, which is an antiparallel six-stranded Greek-key ß-barrel typical for a significant fraction of real ß-type proteins. The force field has been simplified with respect to the previously studied model (Gront et al., 2001
). The present model constitutes a highly simplified version of our older studies of a Greek-key folding motif (Kolinski et al., 1995
), where the effect of multibody potentials on protein dynamics and thermodynamics were investigated in a framework of high coordination lattice model of polypeptide chain (Kolinski et al., 1996
).
Definition of the target native structure
The target structure is an "ideal," six-stranded, antiparallel, ß-barrel motif with a Greek-key topology, assumed to be a lattice representation of the "native structure" (see Fig. 1 A). Using the numbers representing the BASE vectors, this structure could be abbreviated as follows:
![]() | (8) |
|
![]() | (9a) |
![]() | (9b) |
| RESULTS |
|---|
|
|
|---|
. We found all 20 different forms of our native structure and a collection of regular, nonnative structures, which (depending on the model parameters) were the global minimum of energy. These competitive structures are schematically shown in Fig. 2 and their lattice representations are listed in Table 1. The interaction patterns in these structures were analyzed in detail, and the results are given in Table 2. While competitive structures (and the native structures) were found many times in various simulations, no other lower energy structure was ever recorded, regardless of the very broad range of the interaction parameters explored.
|
|
|
![]() | (10) |
According to the interaction patterns provided in Table 2, for i = 1,2,...8, the above inequalities imply the following set of relations between the parameters of the models:
![]() | (11) |
A simple consequence of the above inequalities is that our force field is minimal. It is easy to see that
HP, -
HH, -
PP, -
ß > 0. Indeed, inequalities (11.3) and (11.7) trivially mean that -
PP > 0 and -
HH > 0, respectively. The last condition, together with (11.6), gives
HP > 0. Similarly, from (11.1) one obtains -
ß > 0. Let us also note that the requirement of the parallel contacts of the polar residues in the definition of KPP is a necessary one. Without such an assumption, the competitive structure M3 would have exactly the same pattern of interactions K as the native ones. Consequently, the force field seems to be the simplest one able to satisfy the thermodynamic hypothesis in the context of our lattice model. Thus we have shown that the model force field is minimal, i.e., to have a protein-like model, one needs all types of interactions considered in this work.
The above statement about a minimal character of the interaction scheme relies on our definition of the native state and the assumption that the other low-energy states found in a broad range of interaction parameters are nonnative, misfolded structures. Let us discuss these misfolds in more detail, pointing out their non protein-like features. They are abbreviated with the symbols M1M8 in Fig. 2. M1 differs from the native only with the orientation of a single residue on the C-terminus. Instead of pointing along the barrel, it points sidewise. It was assumed that these kinds of conformations deviate from the regular Greek-key topology and are less "protein-like" than the native target structure. Structure M2, although a compact and regular one, has a different topology. Two loops placed on top of each other are not typical of globular proteins. The topology of M3 is wrong, and some of its P-P contacts are not parallel, and therefore are not counted. The M4 and M6 structures have a poorly defined hydrophobic core. Interestingly, structures M4-M6 are more compact than the native one. Simply, a number of polar residues are buried. Additionaly M5 lacks most of extended ß-type secondary structure. Structures M7 and M8 do not have well-defined hydrophobic core and are not completely folded. As one might intuitively expect, and it is apparent from the quantitative analysis of the following sections, these misfolds result from a wrongly balanced strength of various (short-range and long-range) interactions.
An upper bound for a set of good parameters
Let us denote by E a set of good parameters, i.e., a set of such
, for which the native structure corresponds to the global minimum of conformational energy. Obviously, for every pair of structures s1, s2
S and a positive number a, conditions E(s1) < E(s2) and aE(s1) < aE(s2) are equivalent. Therefore, without loss of generality we can assume that -
ß = 1 and identify
E with the restrictions on (
HH,
HP,
PP). It is easy to see that the system of inequalities (11) is satisfied if and only if
EU, where EU is the convex polyhedron given by the vertices listed in Table 3. Obviously EU is an upper bound for E, which means that E
EU. The shape of the EU is schematically drawn in Fig. 3. Of course, the specific shape of EU depends on the choice of the set of competitive structures, which define the set of inequalities as the one given in Eq. 11. On the other hand, the competitive structures were selected very carefully and they appear to be representative. We searched for the lowest energy conformations over a broad range of interaction parameters and found no other low energy structures. Therefore it is very unlikely that adding more structures to the set of competitive structures could significantly change the estimation of the upper bound for the set of good parameters of the model.
|
|
0 be the center of mass of EU. By the definition
0 = (
1 +
2 +
+
10)/10. Let us also define the set
i(
) = 
i + (1 -
)
0, where i = 1,2,...,10 (enumerating the vertices of the polyhedron EU as shown in Fig. 3) and 0 <
< 1. Suppose, for a moment, that the native structure is a unique, global minimum of E in
i(
), i = 1,2,...,10. Therefore, by the fact that a convex combination of interaction parameters does not change the energy order of structures, the native is a unique, global minimum of E in the convex polyhedron EU(
) defined by vertices
i(
), i = 1,2,...,10. Obviously EU(
)
E
EU and EU(
)
EU if
1.
Unfortunately, we are not able to prove EU(
)
E for some value(s) of
. However, a credible estimation of the lower bound of the parameter space could be obtained from computer experiments. In many Monte Carlo simulations, we obtained the native structure as a unique, global minimum of E in large sets of structures visited during the simulations where Replica Exchange Monte Carlo was employed as a sampling scheme.
Thermodynamics of the model
Replica Exchange Monte Carlo sampling combined with the histogram method provided data for analysis of the thermodynamic properties of the model. Each computational experiment consisted of two parts. The first stage employed 16 replicas with 106 attempts to replica exchange (per replica) and 103 local moves (also per replica) between the exchanges. In the next stage we employed 35 replicas with 3 x 106 replica exchanges and 103 micromodifications between exchanges. The temperatures of particular replicas were linearly distributed around estimated (in preliminary simulations) transition temperature. A modified multihistogram method of Ferrenberg and Swendsen was employed for analysis of the system thermodynamics (Ferrenberg and Swendsen, 1988
; Ferrenberg and Swendsen, 1989
; Newman and Barkema, 1999
).
The thermodynamics of the model system is analyzed in terms of the density of states; this enables us to define the distribution of states for the model system.
![]() | (12) |
![]() | (13) |
E' w(E')exp(-E'/kBT).
This allows the definition of entropy and free energy of the system to be:
![]() | (14) |
![]() | (15) |
At an infinite temperature the system energy can be estimated as:
![]() | (16) |
This enables a definition of an equivalent of the system calorimetric enthalpy:
![]() | (17) |
The ratio of van't Hoff and calorimetric enthalpy is a conventional way to measure the transition cooperativity:
![]() | (18) |
Cooperativity coefficient
assumes value 1 for strictly two-state all-or-none folding transition. Tmax is the temperature corresponding to the maximum Cv(Tmax) of the heat capacity curve. The heat capacity is measured in a standard way from the fluctuations of the system conformational energy. This analysis follows the approach employed previously by Chan and co-workers (Chan, 2000
; Kaya and Chan, 2000a
,b
).
Fig. 4 shows the plots of the average system energy (A) and the average heat capacity (B) as the functions of the dimensionless absolute temperature for the central point of the EU set. These quantities were calculated via canonical averaging (with the free energy given in Eq. 15). The transition temperature T = 0.4246 is very well-defined by the maximum of the heat capacity at constant volume, Cv, plot. A very narrow range of the system temperature indicates a very abrupt folding transition. Fig. 4 C shows the Boltzmann distribution of states (Eq. 3) at the transition temperature. Clearly the highest density of states could be observed at the low-energy end of the spectrum and in the high-energy region. There is a gap in the intermediate energy range suggesting a cooperative two-state transition. The free energy plot at the transition temperature for the central point of the EU set is shown in more detail in Fig. 5. Several interesting features can be seen from this plot. First, due to the discrete character of the model, there is no single line; instead for almost the entire range of system energies, the free energy can assume various scattered values. Interestingly, near the native state the free energy becomes very well defined and reaches a deep minimum at the native state. In spite of the scattered character of the plot, the free energy barrier between the low and the high energy states is well pronounced and provides the signature of an all-or-none, protein-like folding transition. The large symbols in the plot indicate the native conformation (star) and the selected competitive structures. Those similar to the native structure are marked by black symbols (structure M2 was not observed in this trajectory), and the open symbols indicate more exotic misfolds. Interestingly, these appear in the higher energy range; however they are on the native side of the free energy barrier.
|
|
0,
1 (0.85),
2 (0.95),
3 (0.95),
4 (0.95),
5 (0.95), where
5 (0.95) means that the values of the parameters correspond to the following (0.05
0 + 0.95
5) combination of the vectors
0 and
5. These parameters are close to the corners of the EU set, confirming that our selection of competitive structures led to a reasonable estimation of the range of good parameters of the model interaction scheme. Near the other vertices of the set the folding is slow and the native structure appears less frequently in the MC trajectories. For easy comparison the Boltzmann distribution was subject to a smoothing procedure. The averaged quantity is defined below:
![]() | (19) |
Ei = 1.0 is a small, (however containing a large number of states,) energy interval. The results are compared in Fig. 6. The values of cooperativity parameters
are included in all panels. First, it is easy to note that the stronger interactions between polar groups lead to a wider gap in the distributions of states. Indeed, the clearly manifested cooperative folding transitions are for
3 (0.95), and
4 (0.95), where the -
PP parameters describing the polar contacts have the largest values. It could also be noted that the all-or-none transition is well pronounced in these systems where the contribution from all types of long-range interactions is relatively large. The
values given in Fig. 6 are obtained without empirical baseline subtractions. As it was demonstrated by Kaya and Chan (Kaya and Chan, 2000a
value obtained without baseline subtraction approaches 0.7 the true van't Hoff calorimetric enthalpy ratio should be close to one. This applies to the examples given in Fig. 6, D and E. For these systems the transition is clearly very close to the ideal two-state folding. In these cases, the height of the free-energy barrier is in the range of 510 kBT, which implies a negligible population of folding intermediates. As it was shown in the previous sections, some contribution from the short-range interactions is necessary for the uniqueness of the native state; however, the systems dominated by these short-range interactions are very poor folders. In such cases, the transition is slow and the energy gap (or the free energy barrier) is low.
|
|
| CONCLUSIONS |
|---|
|
|
|---|
The present work focused on ß-type systems. Studies of minimal
-type and
/ß-type model polypeptides are now in progress.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This research was supported in part by the Division of General Medical Sciences of the National Institutes of Health (GM 37408). Piotr Pokarowski acknowledges partial support from the Polish Research Council KBN (7-T11F-016-21).
Submitted on June 28, 2002; accepted for publication October 30, 2002.
| REFERENCES |
|---|
|
|
|---|
Anfinsen, C. B. 1973. Principles that govern the folding of protein chains. Science. 181:223230.
Baker, D. 2000. A surprising simplicity to protein folding. Nature. 405:3942.[Medline]
Branden, C., and J. Tooze. 1991. Introduction to Protein Structure. Garland Publishing, Inc., New York and London.
Bryngelson, J. D., J. N. Onuchic, N. D. Socci, and P. G. Wolynes. 1995. Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins. 21:167195.[Medline]
Chan, H. S. 2000. Modeling protein density of states: additive hydrophobic effects are insufficient for calorimetric two-state cooperativity. Proteins. 40:543571.[Medline]
Dill, K. A., S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas, and H. S. Chan. 1995. Principles of protein folding a perspective from simple exact models. Prot. Sci. 4:561602.[Abstract]
Dinner, A. R., A. Sali, and M. Karplus. 1996. The folding mechanism of larger proteins: role of native structure. Proc. Natl. Acad. Sci. USA. 93:83568361.
Ferrenberg, A. M., and R. H. Swendsen. 1988. New Monte Carlo technique for studying phase transitions. Phys. Rev. Lett. 61:26352637.[Medline]
Ferrenberg, A. M., and R. H. Swendsen. 1989. Optimized Monte Carlo data analysis. Phys. Rev. Lett. 63:11951198.[Medline]
Gront, D., A. Kolinski, and J. Skolnick. 2000. Comparison of three Monte Carlo search strategies for a proteinlike homopolymer model: Folding thermodynamics and identification of low-energy structures. J. Chem. Phys. 113:50655071.
Gront, D., A. Kolinski, and J. Skolnick. 2001. A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics. J. Chem. Phys. 115:15691574.
Hansmann, U. H. E. 1997. Parallel tempering algorithm for conformational studies of biological molecules. Chem. Phys. Lett. 281:140150.
Hansmann, U. H. E., and Y. Okamoto. 1997. Numerical comparison of three recently proposed algorithms in the protein folding problem. J. Comput. Chem. 18:920933.
Hansmann, U. H. E., and Y. Okamoto. 1999. New Monte Carlo algorithms for protein folding. Curr. Opin. Struct. Biol. 9:177181.[Medline]
Hukushima, K., and K. Nemoto. 1996. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jap. 65:16041608.
Ilkowski, B., J. Skolnick, and A. Kolinski. 2000. Helix-coil and sheet-coil transitions in a simplified yet realistic protein model. Macromol. Theory Simul. 9:523533.
Jackson, S. E. 1998. How do small single-domain protein fold? Fold. Des. 3:R81R91.[Medline]
Jang, H., C. K. Hall, and Y. Zhou. 2002. Folding thermodynamics of model four-strand antiparallel ß-sheet proteins. Biophys. J. 82:646659.
Karplus, M., and A. Sali. 1995. Theoretical studies of protein folding and unfolding. Curr. Opin. Struct. Biol. 5:5873.[Medline]
Kaya, H., and H. S. Chan. 2000a. Polymer principles of protein calorimetric two-state cooperativity. Proteins. 40:637661.[Medline]
Kaya, H., and H. S. Chan. 2000b. Energetic components of cooperative protein folding. Phys. Rev. Lett. 85:48234826.[Medline]
Kaya, H., and H. S. Chan. 2002. Towards a consistent modeling protein thermodynamic and kinetic cooperativity: how applicable is the transition state picture to folding and unfolding? J. Mol. Biol. 315:899909.[Medline]
Kolinski, A., W. Galazka, and J. Skolnick. 1995. Computer design of idealized ß-motifs. J. Chem. Phys. 103:1028610297.
Kolinski, A., W. Galazka, and J. Skolnick. 1996. On the origin of the cooperativity of protein folding. Implications from model simulations. Proteins. 26:271287.[Medline]
Kolinski, A., and J. Skolnick. 1996. Lattice Models of Protein Folding, Dynamics and Thermodynamics. R. G. Landes, Austin.
Kolinski, A., J. Skolnick, and R. Yaris. 1986. The collapse transition of semiflexible polymers. A Monte Carlo simulation of a model system. J. Chem. Phys. 85:35853597.
Newman, M. E. J., and G. T. Barkema. 1999. Monte Carlo Methods in Statistical Physics. Clarendon Press, Oxford.
Onuchic, J. N., Z. Luthey-Schulten, and P. G. Wolynes. 1997. Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem. 48:545600.[Medline]
Post, C. B., and B. H. Zimm. 1979. Internal condensation of single DNA molecule. Biopolymers. 18:14871501.
Ptitsyn, O. B. 1987. Protein folding: Hypotheses and experiments. J. Protein Chem. 6:273293.
Scheraga, H. A., M.-H. Hao, and J. Kostrowicki. 1995. Theoretical studies of protein folding. In Methods in Protein Structure Analysis. M. Z. Atassi and E. Appela, editors. Plenum Press, New York.
Shakhnovich, E. I., and A. V. Finkelstein. 1989a. Theory of cooperative transitions in protein molecules. II. Phase diagram for a protein molecule in solution. Biopolymers. 26:16811694.
Shakhnovich, E. I., and A. V. Finkelstein. 1989b. Theory of cooperative transitions in protein molecules. I. Why denaturation of globular protein is a first-order phase transition. Biopolymers. 28:16671680.[Medline]
Sugita, Y., and Y. Okamoto. 1999. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314:141151.
Swendsen, R. H., and J. S. Wang. 1986. Replica Monte Carlo simulations of spin glasses. Phys. Rev. Lett. 57:26072609.[Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |