| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Bioinformatics Unit, Centro de Biología Molecular "Severo Ochoa", CSIC-UAM, Cantoblanco, Madrid, Spain; and
Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York
Correspondence: Address reprint requests to Angel R. Ortiz, Tel.: 34-91-497-2376; Fax: 34-91-497-4799; E-mail: aro{at}cbm.uam.es.
| ABSTRACT |
|---|
|
|
|---|
20 slowest vibrational modes accessible to a particular topology. We conclude that, to a significant extent, the structural response of a protein topology to sequence changes takes place by means of collective deformations along combinations of a small number of low-frequency modes. The findings have implications in structure prediction by homology modeling. | INTRODUCTION |
|---|
|
|
|---|
0.4 Å. By contrast, the average root mean-square deviation in the structural core among remote homologues (those below 40% sequence identity) is
2.0 Å (vide infra). These differences are relevant if the modeled structures are expected to be subsequently applied to problems such as drug design, where current docking force fields are known to be sensitive to small structural shifts in the binding sites (Ferrara et al., 2004
Here, we will apply principal components analysis (PCA; Johnson and Wichern, 1998
) to the analysis of multiple structural alignments of a representative set of protein families. The goal is to determine the main evolutionary directions of structural change among the homologous proteins of a given superfamily. Upon characterizing this evolutionary space, we will compare it to be subspace spanned by the vibrational normal modes imposed by the protein topology (Atilgan et al., 2001
). In normal mode analysis (NMA; Ma, 2004
), the potential energy surface is assumed to be quadratic in the vicinity of a well-defined energy minimum, considered here to be the observed experimental conformation.
This assumption of harmonicity allows the motions of the protein to decompose easily into a set of independent harmonic vibrational modes, the normal modes, by solving an eigenvalue problem. To consider motions dictated only by the protein topology, regardless of the peculiarities of the protein sequence, we will employ a simplified form of NMA (normal mode analysis) based on elastic network models (Bahar et al., 1997
; Hinsen, 1998
; Tirion, 1996
).
The normal modes computed by means of elastic network models can be regarded as a set of molecular deformational modes imposed by the protein topology and can then be directly compared with the components detected by PCA, describing the evolutionary directions of deformation. Previous work has already established a connection between normal modes and protein function. Considerable functional insight has been gained by applying NMA to tubulin (Keskin et al., 2002
), adenylate kinase (Temiz et al., 2004
), DNA-dependent polymerases (Delarue and Sanejouand, 2002
), hemoglobin (Xu et al., 2003
), or the mechanosensitive channel from Escherichia coli (Valadie et al., 2003
), to name only a few. Gerstein and co-workers have generalized these findings by showing that one-half of 3800 known protein motions can be described well by perturbing the considered protein along the direction of at most two low-frequency modes (Krebs et al., 2002
). However it is unclear whether or not amino acid sequences are selected during evolution so that proteins follow paths of structural adaptation along low-frequency modes. Here we will show that the comparison of PCA and NMA spaces can shed light on the mechanisms underlying the evolution of protein structures and can provide relevant hints to improve protein modeling as well as protein design algorithms.
| METHODS |
|---|
|
|
|---|
25% on average. The number of families in each superfamily ranges from 1 to 8.
Multiple structural alignments
The structural set corresponding to each one of the 35 families was subjected to multiple structural alignment using MAMMOTH-mult (Lupyan et al., unpublished), a multiple alignment version of the structure alignment program MAMMOTH (Ortiz et al., 2002
). From the alignment, the evolutionary core of the protein family is selected. This is defined as the set of gapless positions for which the C
atoms of all members are within 4 Å from the family average. This way, a matrix Xnxp is obtained containing the Cartesian coordinates of the C
core positions in the family, with n being the number of structures and p 3 times the number of core positions (each position is defined by its corresponding x, y, z Cartesian coordinates).
Evolutionary deformations: PCA
PCA (Johnson and Wichern, 1998
) was used to extract the set of main modes of motion in the alignment that best describes the deformations experienced by the core. Starting from Xnxp, the covariance matrix Cpxp is computed, with elements cij =
(xi
xi
)(xj
xj
)
, where averages <> are over the n structures. Then, C is subjected to spectral decomposition as
where V is an orthogonal matrix containing the set of eigenvectors and
is a diagonal matrix containing the set of eigenvalues. The eigenvector matrix
will then be used in the comparisons with anisotropic network model (ANM; vide infra).
Vibrational modes: the ANM
For the simulation of the vibrational modes we used ANM (Atilgan et al., 2001
). ANM is a special type of NMA. It is a coarse-grained model, which assumes that the protein in the folded state is equivalent to a three-dimensional elastic network. The junctions of the network, considered here the C
atoms, undergo Gaussian-distributed fluctuations under the potentials of their near neighbors, modeled by linear springs. A generic force constant is adopted for the interaction potential between all pairs of residues sufficiently close. The potential energy of the protein (V) as a function of the displacement vector (
) from the native conformation (in Cartesian coordinates) is thus:
where H is the Hessian matrix containing the second derivatives of the energy function, which is assumed to be harmonic. H is computed from the atomic coordinates of the C
atoms in the native structure. Factorization of H as
yields 3N-6 intrinsic normal modes (N being the number of residues), contained in the eigenvector matrix U, with frequencies contained in the diagonal matrix
.The U matrix will be compared with the PCA directions, contained in matrix V, using the core positions selected from the multiple structural alignment.
Relating both spaces: the root mean-square inner product calculation
We compared the vibrational modes obtained by ANM with the structural fluctuations detected by PCA. To simplify the comparisons, the normal mode space is restricted to its 50 lowest frequency modes. Similarly, the evolutionary space is restricted to the number of components required to explain 70% of the variance, five components on average (see below). The overlap between both spaces is calculated from the root mean-square inner product (root mean-square inner product) (Amadei et al., 1999
) of the PCA eigenvectors with the vibrational ones:
![]() | (1) |
Here,
i and
j are, respectively, the set of eigenvectors of the evolutionary and ANM spaces, with dimensionality equal to three times the number of core residues defined by MAMMOTH-mult (Table 1). D is the dimensionality of the evolutionary space (five dimensions were used on average), and k is the dimensionality of the ANM space (the slowest 50 modes were employed). The statistical significance of the observed RMSIP value was tested by simulating an empirical distribution of RMSIP data under the null hypothesis of no relationship between both spaces (Fig. 1). For each family, the empirical distribution of RMSIP values was obtained by projecting the evolutionary space onto k-dimensional orthogonal spaces, obtained from random orthogonal Q matrices following the Stewart algorithm (Stewart, 1980
). Ten thousand orthogonal matrices were generated to generate this distribution, which allows computing the Z-score of the observed RMSIP value, as follows:
![]() | (2) |
|
|
![]() | (3) |
In the case of NMA, the mean-square fluctuation for each residue in the vibrational space can be obtained from a sum over the inner products of the residue entries of the 3N-6 vectors of the eigenvector matrix, scaled by the corresponding eigenvalue, as follows (Atilgan et al., 2001
):
![]() | (4) |
We assigned a value of 1.8 to the prefactor. The fluctuations obtained by both methods are compared. First, we computed, for each family, the Spearman correlation coefficient (Rs; Langley, 1970
) between the list of fluctuations per residue calculated with both approaches. The sampling distribution of Rs under the null hypothesis of no correlation can be closely approximated by a normal distribution having E(Rs) = 0 and
where n is the number of residues. Hence, we computed the Z-score of Rs as
| RESULTS |
|---|
|
|
|---|
A summary of the PCA results can be found in Fig. 2 and Table 1. The structural deformations span a space of low dimensionality; 70% of the total variance in the core fluctuations can be explained with an average of 4.5 ± 1.2 components. Thus, the behavior of all superfamilies in PCA is rather similar, independent of the structural class, size, or number of structures. Although structural sampling is key to the definition of the PCA subspace, and we cannot be confident that a complete coverage of the structural space available to a given superfamily is achieved, the similarity of the results in all cases suggests that our conclusion is robust.
|
|
|
|
20 modes and then tends to plateau. Small and
/ß-proteins show significantly smaller overlaps, whereas
and
+ ß-proteins show the largest ones. Small proteins have a larger number of disulfide bridges, not considered in the ANM, and this could be an explanation for the lower overlap observed. In summary, there is a statistically significant overlap between the deformations observed in the core of homologous proteins and the lowest
20 frequency modes imposed by the protein topology. Thus, the protein core in evolutionary related proteins responds structurally to sequence changes by deformations along combinations of normal modes imposed by the protein topology.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Not surprisingly, we find that the regions experiencing the highest evolutionary fluctuations in the protein core tend to correspond to topologically unconstrained regions. More interesting is the finding that the adaptive movements responsible for these fluctuations are highly cooperative, taking place in a space of low dimensionality, of only 45 dimensions, and similar in all superfamilies. Because side chain degrees of freedom in the protein core are basically dictated by the backbone conformation (Levitt et al., 1997
), this finding suggests that in fact, and as far as the core region is concerned, the conformational space to sample in model refinement is fairly small. The use of PCA directions thus appears as a promising technique to model the structural plasticity among homologous proteins, affording a very efficient sampling of the conformational space accessible to the protein core, and preliminary results indicate that PCA sampling is indeed very efficient (Qian et al., 2004
). The physical origin of this low dimensionality in the evolutionary space seems to rest in the fact that motions allowing a degree of deformability in the structure that can accommodate different homologous sequences are those with sufficiently shallow energy increase when a distortion is imposed. We found these to be on the order of the
20 lowest frequency modes. That is, the fact that the evolutionary subspace overlaps significantly with the subspace spanned by the
20 lowest frequency modes imposed by the protein topology suggests that the evolutionary pathways of structural adaptation make use, to some extent, of combinations of a small number of low-frequency modes imposed by the topology. A corollary is that the protein topology could be an important factor determining the evolutionary history of proteins at the structural level. It remains to be seen whether or not the ANM normal modes, or similar approximations, are accurate enough to be used as surrogates of the PCA eigenvectors in protein modeling problems in those cases where the structural sampling of the family does not allow the derivation of reliable PCA directions. Nevertheless, our results lend support to recent proposals about the use of normal modes for solving difficult molecular replacement problems (Suhre and Sanejouand, 2004
).
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Submitted on September 14, 2004; accepted for publication November 2, 2004.
| REFERENCES |
|---|
|
|
|---|
Atilgan, A. R., S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin, and I. Bahar. 2001. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J. 80:505515.
Bahar, I., A. R. Atilgan, and B. Erman. 1997. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold. Des. 2:173181.[CrossRef][Medline]
Baker, D., and A. Sali. 2001. Protein structure prediction and structural genomics. Science. 294:9396.
Berendsen, H. J., and S. Hayward. 2000. Collective protein dynamics in relation to function. Curr. Opin. Struct. Biol. 10:165169.[CrossRef][Medline]
Brenner, S. E., P. Koehl, and M. Levitt. 2000. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res. 28:254256.
de Groot, B. L., S. Hayward, D. M. van Aalten, A. Amadei, and H. J. Berendsen. 1998. Domain motions in bacteriophage T4 lysozyme: a comparison between molecular dynamics and crystallographic data. Proteins. 31:116127.[CrossRef][Medline]
Delarue, M., and Y. H. Sanejouand. 2002. Simplified normal mode analysis of conformational transitions in DNA-dependent polymerases: the elastic network model. J. Mol. Biol. 320:10111024.[CrossRef][Medline]
Ferrara, P., H. Gohlke, D. J. Price, G. Klebe, and C. L. Brooks 3rd. 2004. Assessing scoring functions for protein-ligand interactions. J. Med. Chem. 47:30323047.[CrossRef][Medline]
Fiser, A., M. Feig, C. L. Brooks 3rd, and A. Sali. 2002. Evolution and physics in comparative protein structure modeling. Acc. Chem. Res. 35:413421.[CrossRef][Medline]
Hinsen, K. 1998. Analysis of domain motions by approximate normal mode calculations. Proteins. 33:417429.[CrossRef][Medline]
Johnson, R., and D. Wichern. 1998. Applied Multivariate Statistical Analysis. Prentice Hall, Upper Saddle City, NJ.
Karplus, M., and J. A. McCammon. 2002. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9:646652.[CrossRef][Medline]
Kelley, L. A., R. M. MacCallum, and M. J. Sternberg. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299:499520.[Medline]
Keskin, O., S. R. Durell, I. Bahar, R. L. Jernigan, and D. G. Covell. 2002. Relating molecular flexibility to function: a case study of tubulin. Biophys. J. 83:663680.
Keskin, O., R. L. Jernigan, and I. Bahar. 2000. Proteins with similar architecture exhibit similar large-scale dynamic behavior. Biophys. J. 78:20932106.
Kitao, A., and N. Go. 1999. Investigating protein dynamics in collective coordinate space. Curr. Opin. Struct. Biol. 9:164169.[CrossRef][Medline]
Koh I. Y., V. A. Eyrich, M. A. Marti-Renom, D. Przybylski, M. S. Madhusudhan, N. Eswar, O. Grana, F. Pazos, A. Valencia, A. Sali, and B. Rost. 2003. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 31:33113315.
Krebs, W. G., V. Alexandrov, C. A. Wilson, N. Echols, H. Yu, and M. Gerstein. 2002. Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins. 48:682695.[CrossRef][Medline]
Langley, R. 1970. Practical Statistics. Simply Explained. Dover, New York.
Levitt, M., M. Gerstein, E. Huang, S. Subbiah, and J. Tsai. 1997. Protein folding: the endgame. Annu. Rev. Biochem. 66:549579.[CrossRef][Medline]
Ma, J. 2004. New advances in normal mode analysis of supermolecular complexes and applications to structural refinement. Curr. Protein Pept. Sci. 5:119123.[CrossRef][Medline]
Marti-Renom, M. A., M. S. Madhusudhan, and A. Sali. 2004. Alignment of protein sequences by their profiles. Protein Sci. 13:10711087.
Marti-Renom, M. A., A. C. Stuart, A. Fiser, R. Sanchez, F. Melo, and A. Sali. 2000. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29:291325.[CrossRef][Medline]
Murzin, A. G., S. E. Brenner, T. Hubbard, and C. Chothia. 1995. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536540.[CrossRef][Medline]
Ortiz, A. R., C. E. Strauss, and O. Olmea. 2002. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 11:26062621.
O'Toole, N., M. Grabowski, Z. Otwinowski, W. Minor, and M. Cygler. 2004. The structural genomics experimental pipeline: insights from global target lists. Proteins. 56:201210.[CrossRef][Medline]
Qian, B., A. R. Ortiz, and D. Baker. 2004. Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. Proc. Natl. Acad. Sci. USA. 101:1534615351.
Sanchez, R., and A. Sali. 1997. Advances in comparative protein-structure modelling. Curr. Opin. Struct. Biol. 7:206214.[CrossRef][Medline]
Shi, J., T. L. Blundell, and K. Mizuguchi. 2001. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310:243257.[CrossRef][Medline]
Steinmetz, A. C., J. P. Renaud, and D. Moras. 2001. Binding of ligands and activation of transcription by nuclear receptors. Annu. Rev. Biophys. Biomol. Struct. 30:329359.[CrossRef][Medline]
Stewart, G. W. 1980. The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM J. Numer. Anal. 17:403409.[CrossRef]
Suhre, K., and Y. H. Sanejouand. 2004. On the potential of normal-mode analysis for solving difficult molecular-replacement problems. Acta Crystallogr. D Biol. Crystallogr. 60:796799.[CrossRef][Medline]
Temiz, N. A., E. Meirovitch, and I. Bahar. 2004. Escherichia coli adenylate kinase dynamics: comparison of elastic network model modes with mode-coupling (15)N-NMR relaxation data. Proteins. 57:468480.[CrossRef][Medline]
Tirion, M. M. 1996. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys. Rev. Lett. 77:19051908.[CrossRef][Medline]
Tramontano, A., and V. Morea. 2003. Assessment of homology-based predictions in CASP5. Proteins. 53(Suppl. 6):352368.[CrossRef][Medline]
Valadie, H., J. J. Lacapcre, Y. H. Sanejouand, and C. Etchebest. 2003. Dynamical properties of the MscL of Escherichia coli: a normal mode analysis. J. Mol. Biol. 332:657674.[CrossRef][Medline]
van Aalten, D. M., D. A. Conn, B. L. de Groot, H. J. Berendsen, J. B. Findlay, and A. Amadei. 1997. Protein dynamics derived from clusters of crystal structures. Biophys. J. 73:28912896.
Xu, C., D. Tobi, and I. Bahar. 2003. Allosteric changes in protein structure computed by a simple mechanical model: hemoglobin T
R2 transition. J. Mol. Biol. 333:153168.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
Q. Wang, F. Cheng, M. Lu, X. Tian, and J. Ma Crystal Structure of Unliganded Influenza B Virus Hemagglutinin J. Virol., March 15, 2008; 82(6): 3011 - 3020. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-W. Yang, A. J. Rader, X. Liu, C. J. Jursa, S. C. Chen, H. A. Karimi, and I. Bahar oGNM: online computation of structural dynamics using the Gaussian Network Model. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W24 - W31. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |