| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


* Faculty of Life Sciences, University of Manchester, United Kingdom;
Bioinformatics Division, School of Biological Sciences, University of Münster, Münster, Germany; and
Protein Engineering Network of Centres of Excellence, Department of Biochemistry, and Department of Medical Genetics and Microbiology, Faculty of Medicine, University of Toronto, Ontario, Canada
Correspondence: Address reprint requests to Hue Sun Chan, Protein Engineering Network Centres of Excellence, Dept. of Biochemistry and Dept. of Medical Genetics and Microbiology, Faculty of Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada. Tel.: 1-416-978-2697; Fax: 1-416-978-8548, E-mail: chan{at}arrhenius.med.toronto.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Rationale for using SEMs to study evolution
SEMs have few adjustable parameters. This is particularly valuable for the formulation and evaluation of general concepts, because the simplicity of SEMs provides for a clear logical link between a set of assumptions and their consequences in the context of an explicit-chain model (Chan et al., 2002
). Deductive reasoning using SEMs is transparent. It is not obscured as is sometimes the case in models that entail complex constructions and invoke approximations of unspecified accuracy. In the SEM approach, proposed scenarios for biopolymer behavior can be tested by performing relatively inexpensive simulations to explore how the assumed (input) SEM parameters lead to predictions (output) that may or may not be consistent with the desired (experimental) phenomena. In this way, the SEM methodology can often offer deep insights when it is applied to tackle questions that cannot yet be addressed by experiments or atomistic modeling. (For reviews see Chan and Dill, 1993
; Bryngelson et al., 1995
; Dill et al., 1995
; Karplus and
ali, 1995
; Shakhnovich, 1996
; Thirumalai and Woodson, 1996
; Dill and Chan, 1997
; Pande et al., 1997
; and Chan et al., 2002
, 2004
.)
For certain SEMs, an exhaustive mapping between all possible sequences and their ground-state conformations is feasible, as was first demonstrated in a short-chain two-dimensional model (Chan and Dill, 1991
). This computational tractability allows for a physics-based, explicit-chain embodiment of key evolutionary concepts from theoretical biology (Lipman and Wilbur, 1991
). Prime examples include the idea of neutral evolution, i.e., biopolymers wandering in a space of equally viable mutants, wherein a neutral network of sequences encoding for the same structure is interconnected by single-point mutations; and inspiring imageries of evolutionary processes as walks on a multidimensional fitness landscape (Maynard-Smith, 1970
; Kimura, 1983
; Wright, 1932
). Extensive SEM studies of evolutionary populations under various selection and adaption constraints have become possible since powerful and inexpensive computers started to be available
15 years ago. (See, e.g., Irbäck and Sandelin, 2000
; Cui et al., 2002
; Xia and Levitt, 2002
; and Sandelin, 2004
, for recent applications to crossovers and the evolution of protein structure and stability; see Blackburne and Hirst, 2001
; Williams et al., 2001
; and Bloom et al., 2004
, for SEM treatments of evolution of function; see Chan and Bornberg-Bauer, 2002
, and Xia and Levitt, 2004a
, for reviews.) As an example of these advances, a key concept that has emerged from SEM studies is that of the superfunnel, the main subject of this investigation. The superfunnel paradigm stipulates that sequence-space topology of neutral nets tend to adopt funnel-like organizations, and that mutational stability (plasticity) of a sequence is strongly correlated with its native thermodynamic stability. Among other insights it affords, this theoretical framework serves to rationalize the often concomitant thermodynamic and mutational robustness of natural wild-type proteins (Bornberg-Bauer and Chan, 1999
).
| THEORETICAL PERSPECTIVES AND MOTIVATIONS |
|---|
|
|
|---|
The "hydrophobic polar" (HP) model (Lau and Dill, 1989
; Chan and Dill, 1990
, 1991
; Dill et al., 1995
) is a widely used two-letter alphabet (i.e., with two residue or monomer types, H and P). The model was designed to capture the essential features of hydrophobic interactions, which is a major stabilizing force in protein folding (Kauzmann, 1959
; Dill, 1990
). Another popular approach employs more heterogeneous interaction schemes with a 20-letter alphabet (Abkevich et al., 1996
; Buchler and Goldstein, 1999
). In approaches that allow for the variation of individual contact interactions that are not based upon residue types, the effective number of residue typesas a parametrization of interaction heterogeneitycan be much higher than 20 (Chan and Dill, 1996
; Buchler and Goldstein, 1999
).
By virtue of their simplification, the scope of SEMs is limited. Recent in-depth analyses indicate that many common SEMs are insufficient for the finer thermodynamic and kinetic details of protein folding, especially the high degree of thermodynamic and kinetic cooperativity exhibited by many real, small, single-domain proteins. Therefore, as far as properties of individual proteins are concerned, more complex modeling constructs are preferable (Chan et al., 2004
). Nonetheless, for evolutionary applications that require an extended coverage of both the sequence and conformational spaces, SEMs remain a uniquely useful tool: From a practical standpoint, the required extended coverage of sequence and conformational spaces is currently not achievable in more complex models. More importantly, at a physical level, insofar as the consistency principle (G
, 1983
) or principle of minimal frustration (Bryngelson and Wolynes, 1987
; Bryngelson et al., 1995
) is applicable to natural proteins, and a given SEM's potential function is motivated by a major part of the intrachain interactions in real proteins (e.g., by attempting to capture the hydrophobic interactions as in the HP model), the SEM sequence-to-structure mapping is physically viable, for the following reason: although the SEM potential function may have to be augmented to achieve a better mimicry of protein energetics (Chan, 2000
; Salvi and De Los Rios, 2003
), for a model sequence that embodies the minimal-frustration principle, by and large the additional terms are expected to consistently favor the same native structure as that encoded by the more rudimentary SEM code (Chan et al., 2002
, 2004
; Chan and Bornberg-Bauer, 2002
; Cui et al., 2002
; Sandelin, 2004
). This perspective is supported by recent insightful analyses of database structures of real proteins. These studies have demonstrated that the general trends of both the sequential (along-the-chain; Irbäck and Sandelin, 2000
) and spatial (core-packing; Sandelin, 2004
) distributions of hydrophobic residues in real protein structures are very similar to that predicted by the two-dimensional (2D) HP model. Echoing the latter observation, the less-than-perfect correlation between sequence hydrophobicity and surface exposure patterns in database proteins has been found to resemble that of a three-dimensional (3D) off-lattice hydrophobic-polar model of protein folding as well (Mölbert et al., 2004
).
Restricting to maximally compact conformations: potential problems
The most commonly used lattices for chain representations in SEMs are the 2D square lattice and 3D simple cubic lattices. For the HP model, exact enumerations that account for all possible self-avoiding walks have been performed extensively on two-dimensional square lattices to determine the ground-state conformations of all possible sequences (Chan and Dill, 1991
; Bornberg-Bauer, 1997b
; Bornberg-Bauer and Chan, 1999
; Cui et al., 2002
; Irbäck and Troein, 2002
). In other studies, however, only selected sequences from an enormous sequence space are considered (e.g., those along an evolutionary trajectory). Sometimes, for the sake of computational tractability, the native conformation of a given sequence is defined as the lowest-energy conformation of a highly restricted set of maximally compact structures (conformations) rather than determined exhaustively from the set of all possible conformations. These include restricting to 2D 4 x 4 and 5 x 5 conformations (Buchler and Goldstein, 2000
; Govindarajan and Goldstein, 1997
) and 3D 3 x 3 x 3 (Li et al., 1996
) and more recently 3 x 3 x 4 conformations (Cejtin et al., 2002
).
As far as polymer physics is concerned, restricting conformational possibility to maximally compact structures (or maximally compact states, MCSs) represents a drastic step with serious consequences (Chan and Dill, 1996
). Artifacts are likely in MCS approaches: Both rigorous lattice computations (Yue et al., 1995
; Micheletti et al., 1998
; Backofen et al., 1999
; Ejtehadi et al., 1999
; Irbäck and Troein, 2002
) and analyses of real protein structures (Goodsell and Olson, 1993
) indicate that true ground-state conformations of model proteins with physically plausible intrachain interactions and real protein native structures are not necessarily maximally compact. Under the MCS restriction, the behavior of a model heteropolymer would no longer be the product of the physical assumption embodied in the model energy function and conformational freedom alone, but rather the result of an altered energy function. Indeed, it has been shown that enforcing MCSs often changes the ground-state conformation(s) of a given sequence; the statistics of the sequence-structure mapping are significantly affected by the MCS restriction as well (Chan and Dill, 1996
). Intuitively, it wouldn't be surprising that on average a larger number of sequences would map onto a given structure (i.e., the structure would have a larger convergence; Chan and Dill, 1991
) if the structural space is smaller because of the MCS restriction. For the case of 25mer 2D HP sequences (chain length n = 25), exact enumeration data (Irbäck and Troein, 2002
) shows that 99.99% of the sequences determined by the MCS approach to have a unique ground-state conformation in fact do not, as these sequences actually have more than one lowest-energy conformation when the full conformational space is considered (Chan and Bornberg-Bauer, 2002
).
The superfunnel idea: model dependence?
Several general features of the protein sequence-structure mapping have been rationalized by multiple studies using a wide range of SEMs (Chan et al., 2002
). A robust propertywhich applies to RNA as well (Tacker et al., 1996
)is that some structures (i.e., ground-state conformations) are much more highly represented than others in the sequence space. In other words, many more sequences encode for the over-represented structures (with large convergence sets) than other structures (with smaller convergence sets) (Schuster et al., 1994
; Li et al., 1996
; Bornberg-Bauer, 1997b
; Govindarajan and Goldstein, 1996
; Buchler and Goldstein, 2000
). Another robust feature of protein SEMs is the topological organization of sequences encoding for the same structure in neutral nets. They tend to form extensive networks connected by small mutational steps, on which an evolutionary trajectory may traverse without changing the structure being encoded (Bornberg-Bauer, 1997b
; Govindarajan and Goldstein, 1997
; Bornberg-Bauer and Chan, 1999
; Trinquier and Sanejouand, 1999
).
By comparison, more detailed properties of neutral net organization have thus far been investigated using only a rather limited set of SEMs. A central feature is the superfunnel paradigm: Certain neutral nets have been shown to organize in a funnel-like manner centered around a prototype sequence (Bornberg-Bauer and Chan, 1999
). This sequence has the largest number of neutral mutations, the highest thermodynamic stability for the native conformation, and often represents the consensus sequence of the protein family. For the 2D HP model, native stability tends to decrease as one moves away in sequence space from the prototype sequence, thus the sequence-space variation of native stability with respect to the Hamming distance from the prototype sequence resembles that of a funnel (Bornberg-Bauer and Chan, 1999
), reminiscent of conformational-space funnels for protein folding (Leopold et al., 1992
; Wolynes et al., 1995
; Dill and Chan, 1997
).
Sequences folding not uniquely but with relatively low degeneracies are enriched in the evolutionary vicinity of these superfunnels. Some of these sequences connect two or more neutral nets, simultaneously encode for more than one structure, and thus can serve as evolutionary switches (Trinquier and Sanejouand, 1999
; Bornberg-Bauer, 2002
). Moreover, uniquely folding 2D HP sequences (and prototype sequences in particular) have been shown to exhibit a significant degree of modular architecture, sometimes with clearly identifiable "autonomous folding units" acting as building blocks for larger structures (Cui et al., 2002
; Chan and Bornberg-Bauer, 2002
). Consequently, and consistent with recent experiments (Voigt et al., 2002
; Otey et al., 2004
), recombinations in conjunction with single-point substitutions are found to be more efficient in exploring novel 2D HP structures than single-point mutations alone (Cui et al., 2002
). A subsequent insightful study shows that recombinations can also lead to significantly higher steady-state populations of prototype sequences (Xia and Levitt, 2002
) than population dynamics based solely on single-point substitutions (Cui et al., 2002
). A more recent investigation of two variants of the 2D HP model confirmed the existence of superfunnels for native stability and found superfunnels for folding rates as well (Xia and Levitt, 2004b
), lending credence to the earlier stipulation that funnel-like organization of sequence space should generally apply for "any measure of fitness provided that its variation with respect to mutations is essentially smooth" (Bornberg-Bauer and Chan, 1999
). To ascertain the generality and robustness of superfunnel organizations, here we extend our investigation to a wider range of SEMs with different model interaction schemes.
Evaluating and comparing SEMs of evolution
To our knowledge, the most extensive studies to date to evaluate parameter dependencies in SEMs of evolution have been carried out by Buchler and Goldstein (1999
, 2000
). They considered a collection of 2-, 4-, 20-, and
-letter models, restricted conformational enumeration to 2D MCSs that can fit within a 5 x 5 square, and concluded that structures that are highly designable for two-letter alphabets are not necessarily highly designable with larger alphabets.
The focus of this study is different, and is complementary to that of Buchler and Goldstein. In view of potential problems in reaching a proper physical interpretation of MCS models (see above), we employ full conformation enumerations as well as MCS enumerations. To allow for an exhaustive accounting of sequence space, here we consider only two-letter alphabets; but we compare highly diverse two-letter model interaction schemes with different modes of residue-residue interactions and different degrees of repulsive interactions. In this work, we seek to answer three questions:
| MODELS AND METHODS |
|---|
|
|
|---|
|
|
(
< 0) is assigned to a pair of nonsequential H monomers if they form a spatial nearest-neighbor contact (termed an HH contact). As discussed above, although the HP model is insufficient for calorimetric cooperativity and its energy landscape is rather rugged (Chan and Dill, 1994
In contrast, in the AB model, like monomers attract and unlike monomers repel (Chan and Dill, 1996
). Although this investigation focuses on 2D AB and related models, it is worth noting that variations of the AB model have been studied extensively in 3D applications as well (Shakhnovich and Gutin, 1993
; Socci and Onuchic, 1994
). The repulsive interactions in the AB model enable more "designing out"; hence the number of sequences having a unique ground-state conformation (g = 1 sequences) is much higher in the AB model than in the HP model (Table 1). However, the AB model is less proteinlike because the A and B monomers tend to segregate in the native structure (see bottom row of Fig. 1), and they do not appear to correspond to any physicochemical classification of amino acid residues. As such, the AB interaction potential is instructive as an example of heteropolymer models that can achieve proteinlike ground-state uniqueness via a manifestly nonproteinlike interaction scheme.
For each of the above models we apply two variations: 1), using a shifted energy matrix with stronger repulsive interactions; and 2), restricting conformational variation to MCSs. Building on the HP and AB models, their respective shifted models incorporate stronger repulsive interactions (Fig. 1), which tend to enhance interaction specificity. It is noteworthy that both MCS restriction (see above) and shifting represent significant changes in the physics of intrachain interactions. As has been critically discussed for a class of 3D 20-letter lattice models (Abkevich et al., 1996
), a shifted interaction potential may bear little resemblance to the original unshifted interaction scheme (Chan and Dill, 1996
; Chan, 1999
; Chan et al., 2002
).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
The upper-middle conformation in Fig. 1 provides an example of a structure that is not encodable in the HP model but is encodable in the shifted HP' model. Indeed, in the HP' model, this top-ranking structure is not only encodable, but is maximally encodable, with 51 sequences sharing it as their common unique ground-state conformation. This structure is not very compact. It has two monomers at one chain end sticking out. Obviously, such a structure would not be encodable in the HP model because in that case a dangling chain end can bend, either resulting in an increase in the number of favorable contacts (if the chain end is an H that can form contacts with other Hs on the surface of the rest of the protein) or leading to a degenerate ground state. In the HP' model in our study, however, sequences can be chosen such that any bending of this two-monomer chain end would result in a repulsive interaction and therefore is disfavored. This example illustrates graphically the utility of repulsive interactions in designing out alternate conformations that would otherwise compete with the target ground state.
A similarly large increase (more than fivefold) in the number of uniquely folding (g = 1) sequences results from modifying the HP model to the HPmax model; but not from modifying the AB model to the ABmax model. The increase is so much more prominent for the HP family because most often fewer HH contacts are achievable in MCSs than in more open conformations. Therefore, when open conformations are eliminated in the HPmax model, a much higher fraction of the 1673 MCSs becomes encodable (from 331/1673 = 19.8% to 1224/1673 = 73.2%, cf. Table 1). On the other hand, in the (unrestricted) AB model, most of the 1673 MCSs (1493/1673 = 89.2%) are already encodable. So imposing the MCS restriction only leads to a marginal increase in encodability to 1577/1673 = 94.3% in the ABmax model. In the AB model, the native conformations of a large majority (75.9%) of the 34,700 uniquely folding sequences are MCSs to begin with. Consequently, only marginal increases in the number of g = 1 sequences are effected by shifting (six sequences, 0.017%) and MCS restriction (2526 sequences, 7.3%).
In short, for the more proteinlike HP family, shifting the intrachain interaction energies greatly enhances the designing out capability, whereas enforcing MCSs artificially designs in many more structures. Both effects lead to a very large increase in the number of uniquely folding sequences. In contrast, the corresponding effects arethough not nonexistentquite insignificant for the AB family of models.
Neutral sets
We next turn to the statistics of neutral sets and neutral nets (Schuster et al., 1994
; Renner and Bornberg-Bauer, 1997
; Bornberg-Bauer and Chan, 1999
). A neutral set of a given structure is the set of all g = 1 sequences that have it as their ground-state conformation. Previously, it has also been referred to as a convergence set (Chan and Dill, 1991
). Thus, an encodable structure is one with a nonempty neutral set. Basic encodability statistics of the six models in Table 1 was explored (Chan and Dill, 1996
), and aspects of neutral set properties of the HP and AB models were investigated (Bornberg-Bauer and Chan, 1999
). But no systematic study has been conducted to compare the sizes of their neutral sets with that of the shifted and MCS-restricted versions of these models.
Table 1 indicates that for the HP family, the number of encodable structures (i.e., the number of neutral sets) undergoes a large increase (
4.5-fold) when the HP model is modified to the shifted HP' model, which allows more designing out. Since most of the encodable structures in the HP model are not MCSs, changing the HP model to the HPmax model results in a small decrease in the number of neutral sets, notwithstanding the large increase of neutral sets for MCSs. On the other hand, shifting the AB model to the AB' model only leads to a small increase in neutral sets (363 more neutral sets, representing a mere 363/4127 = 8.8% increase). Because of the repulsive interactions it contains, the AB model encodes many more structures than the HP model, and much of this enhanced encodability comes from more open structures. Thus, it is not surprising that imposing MCS restriction on the AB model results in a large decrease (from 4127 to 1577) in the number of neutral sets.
As noted above, despite the AB and AB' models' ability to encode relatively open conformations, they have much stronger preferences for MCSs than the more proteinlike HP and HP' models. This difference is most strikingly illustrated by the sizes of their neutral sets for MCSs versus those for non-MCSs. For the HP and HP' models, the average MCS neutral set sizes are, respectively, 1142/331 = 3.5 and 971/310 = 3.1. These are slightly smaller than the average neutral set size of 4.6 for non-MCSs in both the HP and HP' models, indicating that MCSs are not particularly favored in the HP and HP' interaction scheme. This situation is drastically different from that in the AB and AB' models: Average MCS neutral set sizes are 17.6 and 16.9 for the AB and AB' models, respectively, whereas average non-MCS neutral set size is only 3.2 for these models. In other words, for these models, the average MCS neutral set is more than five times larger than the average non-MCS set, implying that the AB and AB' interaction scheme is strongly favorable to MCSs. Hence enforcing MCS on the AB model is in some respects redundant in that it produces little change to the sequence-structure statistics (see Table 1 and discussion above).
Despite these important differences, there is one clear trend of neutral set size distribution that is robust across all six different models. Generally, a few large neutral sets dominate over many small neutral sets in a Zipf-like version (detailed data not shown), as was observed previously for several different models of biopolymers (Schuster et al., 1994
; Li et al., 1996
; Bornberg-Bauer, 1997b
). Here we find that distributions of neutral net size also follow a similar pattern (see below).
Neutral nets
A neutral net is a subset of a neutral set for which all sequences are interconnected with one another via a series of single point mutations. A neutral set can be fragmented into several neutral nets if not all of the sequences in the set are interconnected. These interconnections are depicted in Fig. 2 for neutral nets from four different models. Corresponding drawings for the HP and AB models are available elsewhere (Bornberg-Bauer and Chan, 1999
).
|
|
B symmetry in these models: Given a structure is encoded by an AB sequence, a sequence obtained by interchanging the A and B monomers in the given sequence will also encode for the same structure. However, for the n = 18 AB and AB' models presented here, any two sets of neutral sequences connecting to two individual sequences related by A
B interchange are not interconnected to each other (the "longest neutral path" entries in Table 1 for AB and AB' are less than n/2 = 9), thus all of their neutral sets involve a basic A
B fragmentation. Nonetheless, even after this factor of 2 is taken into account, on average the neutral sets in the AB and AB' models (
4 neutral nets per neutral set) are still significantly more fragmented than that in the HP and HP' models (
1.11.2 neutral nets per neutral set).
Although MCS restriction dramatically increases the average neutral set size for both the HP and AB models, it significantly increases only the average neutral net size of the HP model but not that of the AB model. On average, the HPmax neutral sets (1.9 nets per set, net/set size ratio = 0.52) are more fragmented than the HP and HP' models, but are less fragmented than the AB family of models (7.9 nets/set for ABmax, corresponding average net/set size ratio = 0.13). MCS restriction induces a large increase in the average net size for the HP model (from 3.7 to 14.0), but leads to only a slight increase for the AB model (from 2.0 to 3.0). MCS restriction allows for the emergence of much larger neutral nets in both the HPmax and ABmax models. But the largest neutral net in the HPmax model is almost four times as large as that in the ABmax model. The largest HPmax neutral net comprises 267 sequences, compared to 48 for the largest HP neutral net. The longest continuous path of neutral mutations in the largest HPmax neutral net is 12, almost twice as long as that for the largest HP neutral net. As conformational space is reduced in the MCS models, sequences that previously encode for different structures or are degenerate are now grouped together to form larger neutral nets. In other words, many sequences that fold to a particular structure in the HPmax scheme would not do so if the conformational space was not restricted. This finding is also consistent with a recent investigation of two n = 24 2D HP-like models (Xia and Levitt, 2004b
). Taken together, as for other aspects of the sequence-structure mapping discussed above, MCS restriction appears to have a much more profound impact on the more proteinlike HP model than on the AB model.
Conformity to the superfunnel paradigm and ruggedness of evolutionary landscapes
The prototype sequence of a neutral net has the maximum number of neutral neighbors and thus is mutationally most stable. In other words, the prototype sequence is connected by single-point substitutions to the largest number of other sequences in the neutral net. When there is more than one such sequence in a neutral net, the prototype sequence is taken to be the one that also has the highest native thermodynamic stability (Bornberg-Bauer, 1997a
; Bornberg-Bauer and Chan, 1999
).
Fig. 2 shows that sequences in a neutral net are topologically organized around the prototype sequence. This applies to the four models shown in the figure as well as HP and AB model neutral nets depicted previously (Bornberg-Bauer, 1997b
; Bornberg-Bauer and Chan, 1999
; Chan and Bornberg-Bauer, 2002
). To ascertain the thermodynamic stability of model native (ground-state) structures, densities of states of all g = 1 sequences of the shifted and MCS models are determined here by exhaustive conformational enumeration, as has been performed for the HP and AB models (Bornberg-Bauer and Chan, 1999
). Following the procedure laid out in this reference, the partition function of every g = 1 sequence in all six models is constructed. The strengths of intrachain interactions are controlled by an overall parameter
(
< 0). Using the relative pairwise contact energy eij between monomer types i and j in Fig. 1 for the different models, the energy of an i, j contact is assigned to be
eij with Boltzmann weight exp(
eij/kBT), where kBT is Boltzmann constant times absolute temperature. Then, native stability of every sequence is quantitated by the
/kBT value at the given sequence's thermodynamic folding-denaturation transition midpoint, i.e., when the fractional Boltzmann population of the unique ground-state conformation is 1/2, as we have formulated before. Sequences with thermodynamically more stable native structures have smaller midpoint (
/kBT) values.
A neutral net is said to conform to the superfunnel paradigm if its prototype sequence (of maximal mutational stability, a sequence-space property) is also the sequence with the maximum native thermodynamic stability (a conformational-space property) among the sequences in the neutral net (Bornberg-Bauer and Chan, 1999
). Table 1 assesses the degree to which neutral nets of different models conform to this paradigm (bottom line of entries). A majority of neutral nets in all six models follow the superfunnel paradigm, but the percentages of superfunnel-conforming neutral nets are significantly higher (
90%) for the HP and HP' models than for the other four models (
65%). In this particular regard, it is noteworthy that the MCS-restricted HPmax model resembles the AB family of models rather than displaying kinship with the HP and HP' models.
Fig. 3 provides examples of both superfunnel-conforming (a, c, and d) and nonsuperfunnel neutral nets (b). In this figure, the prototype sequence coincides with the sequence with maximum native stability for the HP, AB', and ABmax neutral nets, but the prototype and maximum-stability sequences are different for the HPmax neutral net shown (cf. Fig. 2). In the graphs in Fig. 3, the single-point substitutions are represented by lines joining pairs of sequences with successive Hamming distances from the prototype sequence. In general, the smoothness of a superfunnel may be characterized by the slopes of these lines. A positive slope implies that the given mutation increases (or decreases) native stability when the sequence moves closer toward (or farther away from) the prototype sequence. On the other hand, a negative slope means that the given mutation would lead to a decrease in native stability when the sequence is moved closer toward the prototype sequence, and vice versa. The HP' superfunnel in Fig. 3 has no negative slopes; all of its 115 mutational connections have positive slopes (a feature very similar to that of the largest HP model; Bornberg-Bauer and Chan, 1999
). We therefore regard this superfunnel as "smooth," because when the model HP' protein evolves toward the prototype sequence, its native stability increases monotonically. In contrast, the less proteinlike AB' and ABmax superfunnels in Fig. 3 are more "rugged"as is the case for the largest n = 18 AB superfunnel (Bornberg-Bauer and Chan, 1999
)in that they have many negative slopes, some of which are quite steep. This feature means that a mutation that brings an AB-type sequence closer to the prototype sequence can sometimes lead to a significant decrease in native stability. In this respect, it is clear from Fig. 3 that the largest HPmax neutral net (a nonsuperfunnel) also has a high degree of sequence-space ruggedness, further indicating that the evolutionary properties of the MCS-restricted HPmax model are quite dissimilar to that of the more proteinlike HP or HP' model.
Two peculiar features of the smooth HP' superfunnel in Fig. 3 are readily related to its particular native structure and the HP' interaction scheme (Fig. 1). First, mutations at some of the sites in the more open parts of this superfunnel's native conformation, namely the two-monomer dangling end and the cavity-encircling loop, lead only to minute decreases in native stability. A case in point would be a P
H mutation of the second-last monomer at the dangling end. This type of mutation results in almost nonexistent stability gaps (
s; Bornberg-Bauer and Chan, 1999
) between the prototype sequence and some of the low-lying (high native stability) nonprototype sequences in this superfunnel. Second, all three mutations on the prototype sequence that lead to a dramatic decrease in native stability (changing the midpoint (
/kBT) from 1.66 to >5) involve an H
P mutation of the third monomer from the nondangling end of the native conformation. It is clear that this is an important "anchoring" site of the structure; and it is quite remarkable that changing its interaction from being attractive to repulsive can even be tolerated.
Fig. 4 shows the distribution of neutral net size (insets) and the variation of native stability of the prototype sequence as a function of neutral net size (main plots). As for neutral sets (Schuster et al., 1994
; Li et al., 1996
; Bornberg-Bauer, 1997b
), all six models have many small neutral nets but only a few large neutral nets. On average, native stability of the prototype sequence increases (lower midpoint (
/kBT)) with increasing size of the neutral net, although the stability of prototype sequences of some smaller neutral nets can exceed that of larger nets. This observation suggests that structures that have larger neutral nets (which tend also to have larger neutral sets or greater designabilities; Li et al., 1996
, 1998
; Koehl and Levitt, 2002
; Wingreen et al., 2004
; see also Govindarajan and Goldstein, 1996
; Buchler and Goldstein, 2000
) are more capable of being encoded by sequences with higher native thermodynamic stabilities.
|
|
|
| CONCLUDING REMARKS |
|---|
|
|
|---|
A central advantage of SEMs is that they provide explicit-chain models of complete sequence-structure mapping that are built upon polymer physics, albeit only in a rudimentary manner. It is this physical aspect that sets SEMs apart from other forms of model sequence-structure mapping, whose physical plausibility is often uncertain since basic polymer properties such as chain connectivity and excluded volume are not taken into account in some of these approaches. It follows that the physicality of the SEM constructs one uses for studying protein evolution is of overarching importance. One should strive to capture as much essential physics of proteins as possible and reduce arbitrariness in the model sequence-structure mapping, and achieve these goals within the confine of limited computational resources. From this vantage point, any restriction on sequence and conformational variation should be critically examined.
Here we found that several key qualitative features of the evolutionary sequence-structure mapping are fairly robust across the models we have investigated. These include a strong bias in sequence-space structure distribution (some structures have much bigger neutral sets than others), possibility of extensive neutral nets, and to some extent the conformity of neutral nets to the superfunnel paradigm. These similarities suggest that these properties are rather general for explicit-chain models of genotype-phenotype mapping.
However, it is equally important to realize that there are quantitative as well as more subtle differences among the models studied here. Interestingly, the more proteinlike HP and HP' models appear to have smoother superfunnels, much less fragmented neutral sets (a neutral set may be viewed as a protein family encoding for the same structure), and a significantly higher fraction of their neutral nets conforming to the superfunnel paradigm than the other models. We found that restricting conformational variation to MCSs has a much more drastic effect on the more proteinlike HP model than on the AB interaction scheme. It is clear that such a restriction imposes a significant change in the fundamental physics of the hydrophobicity-like interactions of the HP model. In several respects, such as neutral set fragmentation, neutral net ruggedness, and conformity to the superfunnel paradigm, the MCS-restricted HPmax model deviates substantially from the original HP model, sometimes as much as that of the differences between the AB family of models and the HP model. As a result, the structural correlation between the HP and HPmax models is not high, since many prominent structures in the HP model are not MCSs and thus are precluded in the HPmax scheme. It also appears that because of the MCS-restricted models' severe constraints on the folded shapes, modular protein evolution cannot be readily addressed by these constructs. In contrast, modularity and autonomous folding units arise naturally in the unrestricted HP model (Cui et al., 2002
; Irbäck and Troein, 2002
; Chan and Bornberg-Bauer, 2002
). All in all, we conclude that these facts should always be taken into serious account, and caution should be exercised in the physical interpretation of results from MCS-restricted evolutionary models.
An instructive future extension of this analysis would be to employ full conformational enumeration to evaluate sequence-structure mapping models with larger alphabets, including rigorously addressing the degree to which the physics of their intrachain interactions is proteinlike (Chan, 1999
; Chan et al., 2002
). In view of the more complex calculations that this would entail, recent developments in constraint-based computational techniques (Backofen et al., 1999
) should be of use in this endeavor.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Submitted on July 26, 2004; accepted for publication October 13, 2004.
| REFERENCES |
|---|
|
|
|---|
Ancel, L. W., and W. Fontana. 2000. Plasticity, evolvability, and modularity in RNA. J. Exp. Zool. 288:242283.[CrossRef][Medline]
Backofen, R., S. Will, and E. Bornberg-Bauer. 1999. Application of constraint programming techniques for structure prediction of lattice proteins with extended alphabets. Bioinformatics. 15:234242.
Bastolla, U., M. Porto, H. E. Roman, and M. Vendruscolo. 2003a. Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J. Mol. Evol. 56:243254.[CrossRef][Medline]
Bastolla, U., M. Porto, H. E. Roman, and M. Vendruscolo. 2003b. Statistical properties of neutral evolution. J. Mol. Evol. 57:S103S119.[CrossRef][Medline]
Blackburne, B. P., and J. Hirst. 2001. Evolution of functional model proteins. J. Chem. Phys. 115:19351942.[CrossRef]
Bloom, J. D., C. O. Wilke, F. H. Arnold, and C. Adami. 2004. Stability and the evolvability of function in a model protein. Biophys. J. 86:27582764.
Bornberg-Bauer, E. 1997a. Chain growth algorithms for HP type lattice proteins. In RECOMB Proceedings. M. Waterman, editor. ACM press, New York. 4755.
Bornberg-Bauer, E. 1997b. How are model protein structures distributed in sequence space? Biophys. J. 73:23932403.
Bornberg-Bauer, E. 2002. Randomness, structural uniqueness, modularity and neutral evolution in sequence space of model proteins. Z. Phys. Chem. 216:139154.
Bornberg-Bauer, E., and H. S. Chan. 1999. Modeling evolutionary landscapes: mutational stability, topology and superfunnels in sequence space. Proc. Natl. Acad. Sci. USA. 96:1068910694.
Bryngelson, J. D., J. N. Onuchic, N. D. Socci, and P. G. Wolynes. 1995. Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins. 21:167195.[CrossRef][Medline]
Bryngelson, J. D., and P. G. Wolynes. 1987. Spin glasses and the statistical mechanics of protein folding. Proc. Natl. Acad. Sci. USA. 84:75247528.
Buchler, N. E. G., and R. A. Goldstein. 1999. Effect of alphabet size and foldability requirements on protein structure designability. Proteins. 34:113124.[CrossRef][Medline]
Buchler, N. E. G., and R. A. Goldstein. 2000. Surveying determinants of protein structure designability across different models and amino-acid alphabets: a consensus. J. Chem. Phys. 112:25332547.[CrossRef]
Cejtin, H., J. Edler, A. Gottlieb, R. Helling, H. Li, J. Philbin, N. Wingreen, and C. Tang. 2002. Fast tree search for enumeration of a lattice model of protein folding. J. Chem. Phys. 116:352359.[CrossRef]
Chan, H. S. 1999. Folding alphabets. Nat. Struct. Biol. 6:994996.[CrossRef][Medline]
Chan, H. S. 2000. Modeling protein density of states: Additive hydrophobic effects are insufficient for calorimetric two-state cooperativity. Proteins. 40:543571.[CrossRef][Medline]
Chan, H. S., and E. Bornberg-Bauer. 2002. Perspectives on protein evolution from simple exact models. Appl. Bioinformatics. 1:121144.[Medline]
Chan, H. S., and K. A. Dill. 1990. Origins of structure in globular proteins. Proc. Natl. Acad. Sci. USA. 97:63886392.
Chan, H. S., and K. A. Dill. 1991. Sequence space soup of proteins and copolymers. J. Chem. Phys. 95:37753787.[CrossRef]
Chan, H. S., and K. A. Dill. 1993. The protein folding problem. Physics Today. 46(2):2432.
Chan, H. S., and K. A. Dill. 1994. Transition states and folding dynamics of proteins and heteropolymers. J. Chem. Phys. 100:92389257.[CrossRef]
Chan, H. S., and K. A. Dill. 1996. Comparing folding codes for proteins and polymers. Proteins. 24:335344.[CrossRef][Medline]
Chan, H. S., H. Kaya, and S. Shimizu. 2002. Computational methods for protein folding: scaling a hierarchy of complexities. In Current Topics in Computational Molecular Biology. T. Jiang, Y. Xu, and M. Zhang, editors. MIT Press, Cambridge, MA. 403447.
Chan, H. S., S. Shimizu, and H. Kaya. 2004. Cooperativity principles in protein folding. Methods Enzymol. 38