| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Neuroscience, and
Department of Pharmacology, Howard Hughes Medical Institute, Columbia University, New York, New York
Correspondence: Address reprint requests to Lei Zhou, Department of Neuroscience, Columbia University, 1051 Riverside Drive, New York, NY 10032. Tel.: 212-543-5584; Fax: 212-795-7997; E-mail: lz150{at}columbia.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Standard MD simulations sample the conformational space of a protein using the definitions for atomic interactions from various force fields and usually include explicitly treated water to reproduce solvent effects (3
,4
). Correlated protein motions can then be extracted from the MD simulations through diagonalizing the covariance matrix obtained from a section of the MD trajectory. This is also referred to as essential dynamics (5
), principal component analysis (PCA) (6
), or quasiharmonic analysis (7
,8
), due to the complex and anharmonic nature of protein dynamics. However, the size of the system, especially with explicitly treated water molecules, has provided a great computational challenge, generally limiting the timescale of MD simulations for large macromolecules to the nanosecond range, significantly shorter than the biologically relevant timescale of conformational changes that may require milliseconds or longer. Therefore, inefficient sampling is still a significant obstacle to extracting meaningful correlated motions from MD simulations (9
,10
).
Classical all-atom normal mode analysis (AANM) offers the ability to overcome some of the computational cost of MD simulations. AANM makes the simplifying assumption that protein motions can be described by harmonic motions around a local minimum on the protein energy surface. Starting with an initial protein structure, standard AANM requires an extensive minimization of the system's potential energy followed by the calculation of the Hessian matrix, whose 3N x 3N (N = the number of atoms) elements represent the second derivative of the potential energy function along the Cartesian coordinates. Diagonalization of the mass-weighted Hessian matrix can then be used to generate the eigenvectors and eigenvalues of the matrix, which provide, respectively, information about the directions of the various correlated motions within the protein and their amplitudes at a given frequency (11
–14
).
However, the application of AANM to biological macromolecules has been limited by the requirements of physical memory to store the all-atom Hessian matrix and the significant CPU time to diagonalize the very large matrix. Therefore, in practice, AANM is normally applied to protein systems containing at most a few hundred residues, in most cases without explicitly treated water molecules. However, since solvent has important and complex interactions with the solute molecule, the explicit treatment of solvents is thought to be essential to faithfully reproduce protein dynamics. For MD simulations, it has been a standard practice to simulate a protein molecule in a box filled with explicitly treated water molecules and use periodic boundary conditions. However, performing AANM on such a system to extract protein motions is usually beyond the capabilities of currently available computational hardware and software.
To date, there have been only a few published AANM studies involving explicitly treated water molecules within a distance of several Angstroms of the protein surface (15
–17
). These studies showed that incorporating surface water is helpful to reproduce experimental observations, including the B-factors determined by x-ray crystallography. However, given the scarcity of these studies, novel techniques, with the ability to efficiently incorporate solvent effects and provide a complete survey of the vibrational spectrum, are still needed to improve the efficiency of AANM for large systems.
Technically, even though the storage of a Hessian matrix has become less of an obstacle due to the introduction of sparse matrix techniques, the diagonalization of the all-atom matrix is still a challenge and new algorithms are being continuously added to various linear algebra packages. These include the method of diagonalization in a mixed basis (18
,19
) implemented in CHARMM (20
) and the iterative Lanczos/Arnoldi factorization method (21
) implemented in GROMACS (22
), two widely used simulation packages. Nevertheless, these iterative numerical methods are still very time consuming and can only yield a small fraction of the total eigenvectors, usually those corresponding to the lowest vibrational frequencies.
Fortunately, the low-frequency vibrational modes are closely related to large-amplitude correlated protein motions with minimum energy costs, which usually reflect the conformational changes relevant to protein function (1
,23
). Indeed, the collective motions represented by these eigenvectors are in good agreement with independent experimental measurements (24
–26
). However, pinpointing the most functionally relevant individual mode is not a trivial task. In addition, it has been suggested that a combination of modes is required for a reasonable mapping of the correlated motions (1
,13
,27
). Moreover, recent studies showed that the modes of higher frequencies are also important, because energy input from external perturbations can shift the distribution of different modes to higher frequencies (13
,28
). Thus, a complete survey of the eigenvector space and corresponding eigenvalues is important for various theoretical applications, such as the calculation of thermodynamic configuration entropy and heat capacity.
As an alternative to classical AANM, coarse-grained approaches have been pursued to reduce the size of the system and improve computational efficiency (14
,29
–31
). The block normal mode method (BNM) is an effective coarse-grained NMA approach that treats proteins as a system of rigid blocks (32
–34
). However, BNM still relies on a complex all-atom representation and starts from the same all-atom Hessian matrix as AANM. An important breakthrough came with the introduction of the elastic network model (ENM), which simplifies the complex atomic interactions to potential energy functions with only a single parameter (1 kcal/mol/Å2) for C-
atoms, thus bypassing the time-consuming energy minimization steps (35
,36
). ENM (or isotropic Gaussian network model) reflects the intrinsic protein dynamics embedded in the overall molecular topology and effectively reproduces certain aspects of the atomic fluctuations determined by NMR and x-ray crystallography (37
–39
). The corresponding model used in NMA is referred to as the anisotropic network model (ANM) (40
). Despite the dramatic simplifications, ENM is widely applied to large macromolecules and assemblies beyond the reach of traditional methods (41
–45
).
However, there is a trade-off between accuracy and speed in these coarse-grained methods. Much effort has gone into comparing results from these approximate methods with the results of classical AANM, the parent method, or the results of MD simulations, as a reference (32
,33
,42
,46
). Based on the degree of similarity between the low-frequency eigenvectors of AANM and the corresponding eigenvectors of coarse-grained methods, BNM has been found to produce more accurate results than ENM (32
,33
,45
). This is not surprising because BNM starts from an extensively energy-minimized system described by an all-atom force field and then projects the all-atom Hessian matrix to the space of predefined blocks. In the limit, this method allocates only one residue in each block, providing the highest possible resolution in the implementation of the BNM method (however, at the greatest computational cost). Such an approach is implemented in the most recent version of CHARMM. Nonetheless, even BNM results show significant deviations from the AANM approach. Moreover, no coarse-grained method is able to incorporate the contributions from explicitly treated water molecules.
Here we implemented a novel coarse-grained normal mode method (CGNM) based on a partition scheme of the all-atom Hessian matrix to extract the correlated motions in the subspace of C-
atoms. We carried out our initial analysis on the 120-amino-acid cyclic nucleotide-binding domain (CNBD) and adjacent upstream 90-amino-acid cytoplasmic C-linker region from the HCN2 hyperpolarization-activated cyclic nucleotide-regulated cation channel (47
). High resolution x-ray crystallography has shown that in the presence of cyclic nucleotides, this isolated soluble protein domain forms a fourfold symmetric tetrameric assembly with one cyclic nucleotide bound in the CNBD of each of the four subunits (48
).
In this study, we report that CGNM provides a more accurate description of the motions of the HCN2 CNBD, as well as that of four other proteins, compared to two other coarse-grained methods, ENM and BNM, based on the degree of similarity of the results from the three coarse-grained approaches with the results of a full AANM analysis. It is important to note that we found that CGNM also allowed us to incorporate explicitly treated surface water molecules into protein motions projected in the subspace of the relevant atoms (C-
atoms in this study). Furthermore, a comparison of our CGNM results containing a layer of surface water with MD results on the same protein in a water-filled box demonstrates the importance of incorporating such surface structural water in studying protein dynamics.
| METHODS |
|---|
|
|
|---|
To carry out the normal mode calculations based on the all-atom force fields, we first performed an extensive energy minimization to ensure that the starting structure represents a local minimum on the energy surface. To achieve this, we applied SD and CG followed by the limited-memory Broyden-Fletcher-Goldfarb-Shanno method (22
) at double precision numerical accuracy to the representative snapshot structure from the MD simulations obtained above. During these energy minimization steps, the electrostatic energy was described by a switch function with the distance for normal treatment set at 15 Å and the cut-off distance set at 18 Å (53
).
The key step in the analysis was then to calculate the Hessian matrix for the entire system, containing the second derivatives of the potential energy functions (
). The matrix was then partitioned into four sections to extract the C-
components according to the equation
![]() | (1) |
![]() | (2) |
atoms here) are incorporated into a simplified Hessian matrix for the relevant-atom subspace, H'xx, using Eq. 2. The theoretical basis for deriving the C-
atom motions based on the atomic fluctuations from classical AANM was published by Berendsen and colleagues (54
atoms (55
subspace. Moreover, a recent study by Eom et al. (56
To sort the Hessian into relevant and nonrelevant parts, we first converted the sparsely-stored mass-weighted Hessian matrix into a double precision ASCII file. We then generated an index file in which the indices for all C-
atoms (relevant atoms) were arranged at the beginning followed by non-C-
atoms. Each entry in the sparse Hessian matrix was read into the program and allocated to a new position, using the index file as a key for sorting. The C-
component (xx part) was stored in a dense matrix format. The symmetrical xy and yx parts were stored in a coordinate format for a sparse matrix. Non-C-
components (yy) were stored in a row-major format for a sparse matrix. We used a direct solving routine from the PARDISO package (57
) and standard LAPACK and BLAS routines for matrix calculations.
Based on the eigenvectors and the corresponding eigenvalues, the following equation was used to calculate the mean-square fluctuation (MSF) (Å2):
![]() | (3) |
is vibrational frequency.
The following equation was used to calculate the configurational entropy based on the eigenvalues from ENM, CGNM, or PCA (58
,59
):
![]() | (4) |
is the vibrational frequency.
Normal mode analysis based on elastic network model (ENM)
C-
atom coordinates from the energy minimized structures were directly used in the NMA based on the potential energy function defined by the elastic network model (ENM or anisotropic network model (ANM)) (40
). For ENM, we used the default settings of the force constant (1 kcal/mol/Å2) and cutoff distance (13 Å).
Block normal mode analysis
The same all-atom Hessian matrix was projected onto a subspace of rigid blocks, each of which contained a single residue for the protein or a single cAMP molecule for the bound ligand, to pursue the highest resolution possible with this method. The degrees of freedom equal six times the number of blocks. The Fortran code of DIAGRTB (v2.52) was used in this research with a modification of the size of the array, LRWORK, from 32,000,000 to 200,000,000, so that larger systems could be accommodated (32
,33
).
PCA based on MD simulations
We used g_covar from GROMACS to perform PCA on a section of the MD trajectory. Overall rotational and translational motions were removed by fitting the protein structure of each time frame to a reference structure (starting frame). For the MD simulations at low temperatures, we reduced the system temperature with a simulated annealing protocol and then collected the MD trajectories after a 200-ps equilibration at the corresponding temperature. For each PCA, we used a 2-ns-long MD trajectory containing 4000 frames. The eigenvalue outputs from the PCA analysis represent the vibrational amplitude and were converted into the square of angular velocity by the equation
![]() | (5) |
![]() | (6) |
is the harmonic mean-square fluctuation as described by the NMA eigenvectors, and Mij is the dot product between the ith PCA eigenvector and jth NMA eigenvector (60
Alignment of correlated coordinate systems
The reference structures from two eigenvector systems were first aligned to the mass center of the molecule. A rotation matrix was then calculated based on two aligned reference structures. The second set of eigenvectors was rotated by the equation
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
Unit conversion among different simulation packages
AANM, CGNM, and BNM use the same mass-weighted Hessian matrix; therefore, the corresponding orthogonal eigenvector output should still be mass-weighted. However, the default output of eigenvectors is not mass-weighted in the GROMACS program and not strictly orthogonal. We modified the source code of GROMACS to generate mass-weighted orthogonal eigenvectors for AANM analysis. We converted the GROMACS eigenvalues (
based on the mass-weighted Hessian matrix, in units of kJ/mol/nm2/amu) into the square of angular velocity (in units of s–2) by multiplying the Gromacs eigenvalues by the conversion factor of 10–24, based on the relation
![]() | (11) |
The following equation was used for calculating the MSF (Å2):
![]() | (12) |
where
is the eigenvalue of the Gromacs unit, k is the atom index, and i is the eigenvector index.
Since we compared the results of different methods in the subspace of C-
atoms, mass-weighting will not affect the eigenvector results of ENM. However, a mass factor is needed for the calculation of vibrational frequencies and atomic fluctuation amplitudes. To our knowledge, there is no standard method for converting the units to compare the ENM results directly with other calculations (e.g., AANM, BNM, etc.) without scaling. Here, we tentatively added a mass factor corresponding to the mass of C-
atom so that the angular velocity in units of s–1 and MSF in units of Å2 can be generated using a force constant of 1 kcal/mol/Å2. The eigenvalues were converted into the square of the angular velocity by multiplying by the factor
![]() | (13) |
The eigenvalue output from PCA analysis (default GROMACS in units of nm2; no mass weighting; C-
only) was converted into the square of angular velocity by the equation
![]() | (14) |
![]() | (15) |
| RESULTS |
|---|
|
|
|---|
atoms (see Methods, Eq. 2). To assess the accuracy of our method, we first compared the results of AANM, the standard for these comparisons, with those of CGNM, as well as with results from two other coarse-grained approaches, ENM and BNM (32
As NMA is based on a harmonic approximation of the protein energy surface near an ideally global minimum, it first requires an extensive minimization of the potential energy of the starting protein structure. Here we used a representative structure of the HCN2 C-terminus protein obtained from a 20-ns-long MD trajectory based on the original crystal structure (48
). This procedure allows the protein structure to be efficiently equilibrated in the same force field used by subsequent NMA (GROMOS96) (50
), as reasonably long MD simulations should optimize the loop conformations and allow for small-scale rearrangement of secondary structures (62
,63
).
We first removed all water molecules from the representative MD snapshot structure. After an extensive minimization of the system, the final structure containing only the protein and cAMP atoms was used to generate the all-atom Hessian matrix, which was then iteratively diagonalized to produce the AANM result, providing the reference for comparison with the coarse-grained methods. Due to the limitation of computational resources, only a small fraction of the total eigenvectors and corresponding eigenvalues were calculated (2000, or 8% of 26,232). All technical details are given in Methods and Table 1. Briefly, ENM starts from the C-
atom coordinates and generates a complete set of orthogonal eigenvectors. BNM and CGNM methods started from the same all-atom Hessian matrix used by AANM. Whereas BNM simplifies the calculation through projecting the all-atom Hessian matrix into predefined rigid blocks, CGNM relies on a matrix-partitioning scheme to integrate the energetic contributions from non-C-
atoms into the motions of C-
atoms. Since the eigenvector outputs of BNM are in the all-atom space, they were projected to the C-
atom subspace for comparison purposes. This was followed by a normalization step that makes each eigenvector unitary (
) but not strictly orthogonal (Vi · Vj = 0, i
j). The eigenvector outputs of CGNM are naturally orthogonal in the C-
subspace and thus were directly used in the overlap analysis.
|
10 or
15 AANM eigenvectors for ENM or BNM, respectively (Fig. 2 A).
|
|
Based on the results shown above, it is clear that the CGNM method generates a more accurate set of eigenvectors than does BNM or ENM. Next, we checked the accuracy of different coarse-grained methods through calculating the atomic fluctuations based on the eigenvalues and eigenvectors of the Hessian matrices, still using the results from AANM as a reference. MSF or root mean-square fluctuation (RMSF) was used to provide a direct measure of the atomic vibrational amplitude. Both MSF and RMSF values can be used to compare computational results with experimental measures of motion, such as B-factors (see Methods, Eq.15).
To gain insight into atomic fluctuations we first plotted the eigenvalues from individual coarse-grained methods against the corresponding values from AANM (Fig. 2 C). Over a large range of eigenvalues, there is a nearly linear relationship between the results of AANM and those of CGNM or BNM, suggesting a close relationship. In contrast, such ENM results did not show a close agreement with those of AANM. Next, we used the eigenvalues and the eigenvectors of the vibrational modes to calculate the atomic fluctuations for each atom (Eq. 3) (Fig. 2 D). As predicted from the eigenvalues, the ENM fluctuation results (blue circles) deviate significantly from the other results. Both the CGNM (red circles) and BNM results (green circles) are in good agreement with the results from AANM on a residue-by-residue basis. Based on the correlation coefficient R values, the CGNM results (0.996) show a slightly better agreement with AANM compared to BNM (0.969). In contrast, both CGNM and BNM correlations are significantly better than that of ENM (0.838). To exclude possible errors introduced by the extraction of C-
components and the normalization step associated with AANM and BNM, we performed independent calculations using the original all-atom eigenvectors, which yielded identical results (Supplementary Material, Fig. S1).
NMA results incorporating a layer of explicit water on the protein surface
Next, we expanded CGNM to incorporate the effect of explicitly treated surface water molecules on protein dynamics, an area that to date has not been addressed by other coarse-grained normal mode methods and is computationally expensive for classical AANM. Previous experimental studies showed that the thickness of the surface structural water layer ranged from 3 Å for lysozyme, determined by x-ray and neutron scattering (64
,65
), to 5 Å for lactose, determined by terahertz spectroscopy (66
). Here, we treated the case of a 4-Å-thick layer of explicit water on the protein surface, a compromise that enables the calculations to stay within the limits of currently available computational resources (Fig. 3, A and B). The all-atom Hessian matrix used in the CGNM calculations incorporated the interactions among all protein atoms, cAMP ligands, and explicitly treated water molecules. The corresponding hydration level is 0.56 (water mass/protein mass) and the system is of significant size (8636 protein atoms, 108 cAMP atoms, and 8817 water atoms). As a result, we could solve for <0.5% (250) of a total of over 52,683 AANM eigenvectors and eigenvalues using local computational resources. The BNM method could not be tested under these conditions because it does not currently include a method for allocating surface water molecules to specific blocks.
|
80%) compared to values obtained with the other methods in the absence of water (
40% (Fig. 3 C)). Improvement was also observed in the larger COF values with CGNM (
93%) versus those with the other methods (
75% using the first 100 eigenvectors (Fig. 3 D)).
Using AANM with surface water results as a standard, we next checked the accuracy of the RMSF values for each C-
atom using ENM, BNM, or CGNM (Fig. 4). CGNM with water (red curve) faithfully reproduced the pattern of the corresponding AANM results (black curve, Fig. 4 A). The shift in absolute amplitude is due to the different number of eigenvectors used (2406, or 99.7%, for CGNM; 244 modes, or 0.5%, for AANM). The striking similarity is reflected in the high R-factor of the CGNM data versus the AANM data (0.925 (Fig. 4 D, left)), which is much greater than that for ENM (0.780) and slightly greater than BNM (0.915, limited to calculations without water).
|
atoms with surface water (0.11 Å2) is significantly smaller than that of protein alone (0.16 Å2), providing further support for the ability of CGNM to incorporate surface water in protein dynamics. Are the CGNM results with a 4-Å layer of surface water molecules comparable to results based on MD simulations, in which the protein is fully embedded in a 102 x 102 x 81 Å3 box filled with both surface and bulk water (Fig. 3 A)? MD simulations of the protein at 300 K did a reasonably good job of reproducing the absolute amplitude and overall pattern of RMSF values from x-ray crystallographic B-factors (Fig. 4 B). However, the RMSF values from MD simulations at 300 K are approximately three times larger than the CGNM results (Fig. 4 A). Moreover, the R factor between MD results and CGNM results with surface water is only 0.70 (Fig. 4 E). This deviation is likely caused by the contribution of random, diffusive motions that are included in the MD simulations but are ignored by the harmonic treatment of motions in all NMA approaches.
Since diffusive motions are greater at higher temperatures, we examined the ability of NMA to more accurately correspond to MD simulation results at lower temperatures. We first collected MD trajectories at 120 K and 180 K, respectively. The RMSF based on the MD simulation at 120 K was much smaller than that at 300 K; however, the convergence with CGNM results was not improved (0.69) (Fig. 4 B). Interestingly, the agreement between CGNM and MD simulation results at 180 K was much improved, in terms of both the absolute value of fluctuations and overall pattern, with the R-factor increased to 0.85 with a slope factor of 0.82 (Fig. 4 C). These results indicate that at certain low temperatures (180 K), the atomic fluctuations can be largely accounted for by harmonic motions involving the protein plus surface water but do not necessarily involve the bulk solvent that is also present in the MD simulations.
Comparing MD simulations with CGNM, with and without water, at different temperatures
To further examine the effect of surface water on protein dynamics, the distribution of vibrational-mode frequencies was plotted for both CGNM and MD simulations at the three different temperatures. Previous studies have shown that explicit water has a complex influence on protein dynamics, including temperature-dependent frictional dampening and temperature-independent shifting of the vibrational modes to higher frequencies (68
–70
). However, it is not clear whether these effects are due to an interaction of the protein with surface structural water versus an interaction that also requires the presence of bulk solvent (71
). Here, using CGNM, we find that surface water alone is sufficient to shift fluctuations to higher frequencies (Fig. 5 A), an effect that is observed at all three temperatures, thus confirming its temperature-independent character. It is most likely that this effect represents a static interaction of a cagelike structure formed by the interaction of surface water with exposed residues on the protein surface (65
).
|
10 Å from protein surface plus periodic boundary condition). The distribution of vibrational modes from MD simulations at low temperatures (180 K and 120 K) was quite similar to those from CGNM with water (Fig. 5 B). However, MD simulations, but not CGNM, revealed a significant shift to lower frequencies upon raising the temperature to 300 K. This is in good agreement with previous studies suggesting that the shift to low frequencies is related to the anharmonic nature of protein dynamics (Fig. 5 C) and that the contributions from bulk solvent are more prominent at high temperatures (300 K) and for low-frequency modes (69
As a final test of the various NMA approaches, we compared the orthogonal sets of eigenvectors in the subspace of C-
atoms derived from the three coarse-grained methods (ENM, BNM, and CGNM) versus the eigenvectors based on PCA of MD simulation trajectories (Fig. 5 D). At 300 K, the COF curves show a poor overlap between the eigenvectors at frequencies <25 cm–1 from the MD simulations versus those obtained from all three NMA methods. At increasing frequencies, there is a gradual increase in the overlap of the NMA results with the PCA modes, consistent with the idea that frequencies >40–50 cm–1 correspond to more "harmonic" protein vibrations (60
).
Does the use of CGNM with a layer of water molecules on the protein surface make any significant difference in the overlap of eigenvectors with MD simulations? Careful examination of the COF curve suggests that indeed the results from CGNM with water are slightly but consistently better than the CGNM results without water at 300°K. An even greater improvement in overlap with the MD simulations was observed when using CGNM with water at lower temperatures (180 K (Fig. 5 E) and 120 K (Fig. 5 F)). The increased overlap between CGNM with surface water and the MD simulations at the two lower temperatures (but especially at 120 K) suggests not only that CGNM is able to incorporate, at least partially, the contributions from explicit surface water, but also that surface water makes a significant contribution to protein vibrational modes with frequencies >50 cm–1.
| DISCUSSION |
|---|
|
|
|---|
components from the all-atom Hessian matrix, thus providing a novel coarse-grained NMA approach, which we termed CGNM. This method generated more accurate results than did other coarse-grained NMA methods, including ENM and BNM, based on a comparison with results obtained using classical AANM. However, CGNM retained the benefits of a great reduction in computational cost with the two other coarse-grained approaches. The flexibility in partitioning the all-atom Hessian matrix into relevant versus nonrelevant groups makes it straightforward to scale the scope of analysis, for example, from C-
atoms only to inclusion of all backbone atoms, depending on the size of the system and the available computational resources. In this manner, we were able to model the contributions from explicitly treated surface water to protein motion, which is beyond the reach of other coarse-grained NMA methods. Thus, the CGNM method represents a novel coarse-grained NMA approach that can be used to obtain more accurate results for systems of significant size. Multiple lines of evidence indicate that CGNM produces more accurate results than ENM or BNM, using AANM results as a reference. Overlap plots and spanning coefficients clearly show that CGNM outperformed the other coarse-grained methods for the first 100 or so individual eigenvectors, which are of great functional significance because they represent the directions of protein conformational changes with highest amplitude, slowest frequency, and least energetic cost. COF, which has the advantage of representing the overlap of two groups of eigenvectors, confirmed that CGNM more closely reproduces classical AANM results than do ENM or BNM methods. We also confirmed that CGNM outperforms ENM or BNM on dozens of other proteins, with a range in size from 200 to 1300 amino acids (results of four other sample proteins are shown in Figs. S2 and S3).
A comparison of eigenvalues and related atomic fluctuations among different NMA methods also revealed differences among the three coarse-grained methods. ENM generated a surprisingly good match to the AANM results given the dramatic simplifications in its potential energy functions. However, the results from BNM and CGNM were in much better agreement with the AANM results compared to ENM, indicating that the detailed chemical information embedded in the all-atom Hessian matrix used for AANM, BNM, and CGNM makes an important contribution. These results are in good agreement with a recent comprehensive comparison among NMA approaches of different complexity (41
). Moreover, CGNM performed slightly better than BNM, as shown by the correlation coefficient (R) between their MSF values and the values obtained by AANM.
Why does CGNM yield more accurate results than BNM, even though both methods are derived from the same all-atom Hessian matrix? BNM is rooted in the rotation-translation block model, which projects the all-atom Hessian matrix into a subspace of rigid blocks. Even though BNM fully takes into account the coupled motions between different blocks, the method ignores the small high-frequency vibrations related to the intrinsic flexibility within each block (72
). Moreover, during analysis, the intermediate BNM results in the subspace of rigid blocks must be projected first back onto the space for all atoms and then onto the subspace of C-
atoms. However, the center of mass for each block is different from the position of the C-
atoms and varies among different amino acid residues. In contrast, CGNM is based on partitioning the all-atom Hessian matrix through a simple but theoretically rigorous scheme, which is then used to derive the motions for the C-
atoms (55
). The fact that CGNM implicitly incorporates energetic contributions from non-C-
atoms into C-
atoms may contribute to the greater accuracy of this method.
A key advantage of CGNM is its ability to incorporate the detailed chemical information imbedded in the protein structure, including explicitly treated structural water molecules on the protein surface. For most MD applications, it has been relatively standard to treat solvent molecules explicitly, which is required to reproduce the electrical and dynamic properties of solvents (70
,73
–76
). Indeed, experimental and theoretical studies have found that the surface water molecules within a radial distance of 3–5 Å from the protein surface have very different physical-chemical properties from those in bulk solvent and play important roles in modulating protein motions. For example, the experimental observation that the density of the first hydration shell is
5% higher than that of bulk water has been successfully reproduced by MD simulations (65
,77
,78
). Ideally, classical AANM should be performed on the same protein-water system used in MD simulations. However, the size of the system limits the AANM method so that bulk solvent and surface water must often be omitted for proteins of significant size. Here, we applied CGNM to systems containing a layer of explicitly treated water molecules and found that it not only reproduced results based on classical AANM with explicit surface water, but also helped delineate some features of complex solvent effects.
The choice of a surface water layer of 4 Å in this study represents a balance that places a modest demand on computational resources but is consistent with experimental observations on protein surface water thickness, ranging from 3 Å for lysozyme (64
,65
) to 5 Å for lactose (66
). We found that CGNM, which is based on a harmonic approximation to the energy surface, in the presence of surface water is able to reproduce MD results for atomic fluctuations of a fully solvated protein at 180 K. Interestingly, this temperature (180 K) is near the glass-transition point where diffusion starts to contribute significantly more to protein dynamics than does harmonic vibration (17
,71
,79
–81
). Moreover, spectroscopic experiments on bovine serum albumin showed that there is a significant dynamic change (glass transition) at around 170 K to 180 K, which might be due to formation of a rigid structure formed by water molecules covering the protein surface (79
,82
). These results corroborate this study, in which only the surface water is treated explicitly.
However, it is noticeable that even though CGNM results with water show an improved match with the atomic fluctuations from MD simulations compared to CGNM results on the dehydrated protein, the results from CGNM differ in important respects from those obtained using MD simulations or from experimentally determined crystallographic B-factors. This might be due to the complex nature of the protein energy surface and complex interactions between protein and solvent. The good agreement between the B-factors and the MD results at 300 K confirms the advantages of MD, a method that does not involve a harmonic approximation of the protein energy surface and explicitly treats all water molecules (Fig. 4 B).
The CGNM results are successful in reproducing previous observations that solvation increases protein vibrational frequencies and point to the role of surface water in this phenomenon (24
,69
,83
). Moreover, these effects are likely to reflect temperature-independent interactions in which surface water molecules serve to fill in protein surface irregularities and stabilize polar side chains, forming a cagelike structure around the protein surface (83
). In contrast, a comparison of CGNM with surface water to MD simulations including bulk water indicate that bulk water molecules behave more like free water, acting to decrease the vibrational frequency of protein dynamics in a temperature-dependent manner (70
). A recent experimental study of the influence of hydration on protein dynamics gives direct support to our results. Quasielastic neutron and light-scattering experiments show that adding an initial hydration layer (h
0.2) increases the fast vibrational modes. Interestingly, further increasing the hydration level (h > 0.2) significantly activates slower processes (78
). Therefore, these experimental observations are in good agreement with our simulation results showing the different contributions of solvent molecules to protein dynamics (70
,84
,85
).
The poor overlap in the eigenvectors from various NMA approaches (no water or surface water only) with the MD simulation results (surface water and bulk water), especially for the low-frequency modes (25 cm–1), is not surprising. A previous study using a jump-among-minima model, which divides protein motions into intra-substate motions and inter-substate jumps based on a multiple local minima model of the energy surface, generated a much better overlap with MD results than does NMA (86
). Moreover, a mixture of harmonic NMA plus diffusive Brownian dynamics has been proven to be effective in reproducing the results of MD simulations and experimental observations (55
,87
). These studies suggest that the harmonic approximation of the protein energy surface and the neglect of solvent limits the ability of NMA approaches, including AANM, BNM, and CGNM, to reproduce the directionality of intrinsic anharmonic protein dynamics in the native state (54
,84
). However, for modes beyond 25 cm–1, there is a gradual increase in the fidelity of CGNM and BNM, especially for CGNM with a layer of surface water. It is interesting that this frequency region is the same spectrum covered by terahertz absorption spectroscopy (1 THz = 33 cm–1), where experimental results showed that solvation tends to enhance protein dynamics (88
–90
). Therefore, CGNM provides a convenient tool for modeling the contributions of surface water into protein dynamics at these higher frequencies. In principle, CGNM could be expanded to incorporate the effect of bulk solvent molecules in conjunction with other methods, such as the Langevin Model (71
).
Thus far, the results presented for the HCN2 CNBD are for the cyclic-nucleotide bound state of the protein. However, we obtained similar results for the unliganded protein, using a representative snapshot from a 20-ns-long MD simulation with cAMP removed as the starting structure (Fig. S4–S6). One theoretical application of a complete set of eigenvectors and eigenvalues from NMA is the estimation of the configurational entropy (58
,59
). Taking advantage of the CGNM results for the unliganded protein versus the cAMP-bound protein in the subspace of C-
atoms, we estimated the entropy change of C-
atoms upon cAMP binding to be –127.8 J/mol without water or –174.3 J/mol with surface water (Table 2). Both values should be smaller than the estimate involving all atoms. However, the direction of the changes from the two independent calculations is consistent with previous MD results and the concept that ligand binding for hydrophilic or charged ligands (cAMP carries a negative charge) usually involves a reduction in the configurational entropy of the protein (91
–93
). Further improvements of the computational routine will focus on reducing the memory cost and use of more efficient routines for sparse matrix manipulation. With advances in computational algorithms, more memory-efficient and high-performance (sequential or parallel) routines could further improve this method and thus widen its application to more complex systems.
|
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Submitted on June 22, 2007; accepted for publication December 28, 2007.
| REFERENCES |
|---|
|
|
|---|
2. Kitao, A., and N. Go. 1999. Investigating protein dynamics in collective coordinate space. Curr. Opin. Struct. Biol. 9:164–169.[CrossRef][Medline]
3. Karplus, M., and J. A. McCammon. 2002. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9:646–652.[CrossRef][Medline]
4. McCammon, J. A., B. R. Gelin, and M. Karplus. 1977. Dynamics of folded proteins. Nature. 267:585–590.[CrossRef][Medline]
5. Amadei, A., A. B. Linssen, and H. J. Berendsen. 1993. Essential dynamics of proteins. Proteins. 17:412–425.[CrossRef][Medline]
6. Garcia, A. E. 1992. Large-amplitude nonlinear motions in proteins. Phys. Rev. Lett. 68:2696–2699.[CrossRef][Medline]
7. Ichiye, T., and M. Karplus. 1991. Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins. 11:205–217.[CrossRef][Medline]
8. Janezic, D., R. M. Venable, and B. R. Brooks. 1995. Harmonic analysis of large systems. III. Comparison with molecular dynamics. J. Comput. Chem. 16:1554–1566.[CrossRef]
9. Hess, B. 2000. Similarities between principal components of protein dynamics and random diffusion. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics. 62:8438–8448.[Medline]
10. Balsera, M. A., W. Wriggers, Y. Oono, and K. Schulten. 1996. Principal component analysis and long time protein dynamics. J. Phys. Chem. 100:2567–2572.[CrossRef]
11. Brooks, B., and M. Karplus. 1983. Harmonic dynamics of proteins: normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. USA. 80:6571–6575.
12. Brooks, B. R., D. Janezic, and M. Karplus. 1995. Harmonic analysis of large systems. I. Methodology. J. Comput. Chem. 16:1522–1542.[CrossRef]
13. Ma, J. 2005. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure. 13:373–380.[Medline]
14. Go, N., T. Noguti, and T. Nishikawa. 1983. Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci. USA. 80:3696–3700.
15. Cui, Q., G. Li, J. Ma, and M. Karplus. 2004. A normal mode analysis of structural plasticity in the biomolecular motor F(1)-ATPase. J. Mol. Biol. 340:345–372.[CrossRef][Medline]
16. Ma, J., and M. Karplus. 1997. Ligand-induced conformational changes in ras p21: a normal mode and energy minimization analysis. J. Mol. Biol. 274:114–131.[CrossRef][Medline]
17. Yu, X., J. Park, and D. M. Leitner. 2003. Thermodynamics of protein hydration computed by molecular dynamics and normal modes. J. Phys. Chem. B. 107:12820–12828.
18. Durand, P., G. Trinquier, and Y. H. Sanejouand. 1994. A new approach for determining low-frequency normal modes in macromolecules. Biopolymers. 34:759–771.[CrossRef]
19. Perahia, D., and L. Mouawad. 1995. Computation of low-frequency normal modes in macromolecules: improvements to the method of diagonalization in a mixed basis and application to hemoglobin. Comput. Chem. 19:241–246.[CrossRef][Medline]
20. Brooks, B. R., R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. 1983. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4:187–217.[CrossRef]
21. Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK Users' Guide, 3rd edition. Society for Industrial and Applied Mathematics, Philadelphia.
22. Lindahl, E., B. Hess, and D. van der Spoel. 2001. GROMACS 3.0: a package for molecular simulation and trajectory analysis. J. Mol. Model. 7:306–317 (online computer file).
23. McCammon, J. A., B. R. Gelin, M. Karplus, and P. G. Wolynes. 1976. The hinge-bending mode in lysozyme. Nature. 262:325–326.[CrossRef][Medline]
24. Balog, E., J. C. Smith, and D. Perahia. 2006. Conformational heterogeneity and low-frequency vibrational modes of proteins. Phys. Chem. Chem. Phys. 8:5543–5548.[CrossRef][Medline]
25. Karplus, M., Y. Q. Gao, J. Ma, A. van der Vaart, and W. Yang. 2005. Protein structural transitions and their functional role. Phil. Trans. R. Soc. Lond. 363:331–355; discussion, 355–356.
26. van der Spoel, D., B. L. de Groot, S. Hayward, H. J. Berendsen, and H. J. Vogel. 1996. Bending of the calmodulin central helix: a theoretical study. Protein Sci. 5:2044–2053.[Abstract]