help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kim, M. K.
Right arrow Articles by Chirikjian, G. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kim, M. K.
Right arrow Articles by Chirikjian, G. S.

Biophys J, September 2002, p. 1620-1630, Vol. 83, No. 3

Efficient Generation of Feasible Pathways for Protein Conformational Transitions

Moon K. Kim,* Robert L. Jernigan,dagger and Gregory S. Chirikjian*

 *Department of Mechanical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218; and  dagger Molecular Structure Section, Laboratory of Experimental and Computational Biology, CCR, NCI, NIH, Bethesda, Maryland 20892-5677


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

We develop a computationally efficient method to simulate the transition of a protein between two conformations. Our method is based on a coarse-grained elastic network model in which distances between spatially proximal amino acids are interpolated between the values specified by the two end conformations. The computational speed of this method depends strongly on the choice of cutoff distance used to define interactions as measured by the density of entries of the constant linking/contact matrix. To circumvent this problem we introduce the concept of using a cutoff based on a maximum number of nearest neighbors. This generates linking matrices that are both sparse and uniform, hence allowing for efficient computations that are independent of the arbitrariness of cutoff distance choices. Simulation results demonstrate that the method developed here reliably generates feasible intermediate conformations, because our method observes steric constraints and produces monotonic changes in virtual bond and torsion angles. Applications are readily made to large proteins, and we demonstrate our method on lactate dehydrogenase, citrate synthase, and lactoferrin. We also illustrate how this framework can be used to complement experimental techniques that partially observe protein motions.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

Proteins are well known to be intrinsically flexible structures. Many proteins have been determined to have multiple conformations (in some cases called "open" and "closed" forms; Berman et al., 2000). Conformational transitions between two forms are often important for understanding the relationship between structure and function. In other words, such motions are involved in many basic functions such as catalysis, regulation, transportation, and aggregation (Subbiah, 1996). Hence, comprehending conformational transitions can be useful for understanding biological mechanisms, especially for protein machines.

However, it is also an important topic in molecular graphics to visualize conformational transitions. Obviously, one of the best ways is through animations, such as digital movie files (e.g., AVI or MPEG). Animations usually are produced by inserting images of intermediate conformations between the two conformations. These hypothetical intermediate conformations are visualized in sequence for an animation.

There have been several previous efforts in this area. Vonrhein et al. (1995) produced movies of conformational transitions by linear interpolation between the atomic coordinates of the two end conformations in Cartesian space. One problem with that method is that the bond lengths and angles of the intermediate conformations can be unrealistic, and in several cases protein chains actually pass through one another. To overcome this problem, Gerstein and Krebs (1998) applied proper restraints and minimized the energy of each intermediate conformation to correct for molecular stereochemistry and enforce rules of molecular structure.

An alternative interpolation approach is to use internal coordinates such as bond lengths, bond angles, and torsion angles instead. Kleywegt and Jones (1996) implemented this approach to construct intermediate conformations with their LSQMAN program. Ideally, this approach produces realistic bond lengths and torsional angles, but this method also has some problems. If one constructs intermediate conformations by interpolating torsional angles between two end conformations while holding bond lengths and angles fixed, one will often obtain infeasible pathways for several reasons. First, it may not even be possible for the conformation from one end to reach the other end in Cartesian space because the two conformations do not have identical values of internal variables such as bond lengths and bond angles. Therefore, one must either refine the two end conformations until they have a consistent set of internal variables, except for the torsion angles, before interpolating over torsion angles, or instead interpolate all of the internal variables simultaneously to avoid this problem. A second limitation is that in the process of generating intermediate conformations some residues can come too close to each other in order to not break the smoothness of the simulated pathway and this can produce unfavorable states in the sense of high-energy interactions and steric clashes. Fig. 1 shows that a particular pair of alpha -carbons can come too close to each other during conformational transitions using internal coordinate interpolation, which would give rise to exceedingly high repulsive energy peaks because of van der Waals forces between nonbonded atoms. A third problem occurs for specific values of internal rotation angles. Most are not equi-energetic for all values. Consequently, some forms have higher energies, and the intermediate forms generated could have inordinately high values, even if other lower energy pathways do exist.



View larger version (15K):
[in this window]
[in a new window]
 
FIGURE 1   An example of torsional angle interpolation between two lac repressor headpiece structures (named 1LCC and 1LCD, Protein Data Bank). (a) During the conformational transition from 1LCC to 1LCD, the alpha -carbons of Val-15 and Thr-34 unrealistically come too close to each other, <= 1 Å. (b) This unrealistic relative distance between two atoms (solid-dot line) is compared to the result of the "distance interpolation method" developed in this paper (solid line). The conformation from one end does not reach the other end in Cartesian space because the two sets of crystallographic coordinates do not yield identical values of bond lengths and bond angles. One must either refine the two end conformations until they have a consistent set of internal variables, except for the torsion angles, before interpolating over torsion angles, or interpolate the full set of internal variables simultaneously. However, our method appears to conform well to steric constraints when reaching the other end without refinement of internal variables.

There has been substantial work on the development of algorithms to generate plausible "reaction pathways." Elber and Karplus (1987) proposed to make a trial path connecting reactant and product structures. This path minimizes the functional
T=<FR><NU>1</NU><DE>L</DE></FR> <LIM><OP>∫</OP><LL><B><UP>r</UP></B><SUB><UP>R</UP></SUB></LL><UL><UP><B>r</B><SUB>P</SUB></UP></UL></LIM> U(<B><UP>r</UP></B>)dl(<B><UP>r</UP></B>), (1)
where l(r) is the reaction path connecting reactant (rR) and product (rP) structures, U(r) is the potential energy, and L = int  dl(r) is the path length (Czerminski and Elber, 1990). This algorithm demonstrates a good reaction path for the alanine dipeptide and the tetrapeptide IAN. However, the path that minimizes the functional above may not be the path with the maximum rate of transition between reactants and products. Jónsson et al. (1998) developed a "nudged elastic band method," which is a modified path method to find minimum energy paths by constructing a set of intermediate conformations between reactant and product structures. A spring interaction between adjacent conformations is added to ensure the continuity of the path. Minimization of the force acting on the conformations yields the minimum energy path. Another path method is the MaxFlux method proposed by Huo et al. (1997). It computes the reaction pathway of maximum diffusive flux by minimizing a discretized form of the line integral in Eq. 1 with added restraints such as constant mean-square distance between adjacent intermediates, repulsive interactions between adjacent intermediates along the path, and linear and angular momentum conservation for the system. This method was used to find the optimal pathway of the coil-to-helix transition in a short polyalanine chain, but these path methods have not been applied to a large protein because it is a computationally demanding task to find the global minimum value of the objective function in a high-dimensional space out of all possible reaction paths.

Some methods have considered probabilistic models of pathways. Olender and Elber (1996) integrated classical Newtonian dynamical equations of motion to compute long-time molecular dynamics trajectories based on the stochastic path integral. The activated dynamical transition path method developed by Dellago et al. (1997) generates and samples an ensemble of transition paths, which evolve according to stochastic dynamics (either Metropolis Monte Carlo or Brownian dynamics) and conserve the Boltzmann distribution. Again, these stochastic methods have been tested for simple cases such as the alanine dipeptide and two-dimensional Lennard-Jones clusters, but they also are computationally too expensive to be applied to a large protein.

Molecular dynamics (MD) simulations, a powerful method for the study of details of molecular motion, and normal mode analysis (NMA) using all-atom empirical potentials, are often used to follow the dynamics of proteins (Brooks and Karplus, 1985; Xu et al., 1997; Xu and Sigler, 1998). However, the use of atomic approaches becomes computationally inefficient with the increased size of a system.

To reduce such a computational burden, many recent papers have demonstrated the utility of coarse-grained protein models by including, for example, only alpha -carbons as point masses representing residues and using a simplified potential for considering internal interactions between neighboring residues. Such models are suitable to describe the global motions of complex systems of small proteins or single proteins having more than several thousand residues (Atilgan et al., 2001; Bahar and Jernigan, 1998; Bahar et al., 1999; Jaaskelainen et al., 1998; Tama and Sanejouand, 2001; Tirion and Ben-Avraham, 1993, 1998).

In this paper we develop a new interpolation method for generating feasible pathways for conformational transitions using the simplest potential and coarse-grained protein models. The key idea is to interpolate uniformly the distances between spatially proximal residues in both conformations within the context of the elastic network model. The present approach can be referred to as a "distance interpolation method," which is completely different from position interpolation in Cartesian space. Because we interpolate relative distance between spatially close residues, unrealistic conformations and steric clashes become far less likely. This method offers a reasonable compromise between oversimplified linear interpolation in Cartesian or internal coordinates and computationally expensive methods such as MD simulations. We discuss this efficient modeling technique for large proteins. Our method computes sets of interpolated conformations within reasonable times in cases where full-atom computations such as MD or even NMA may be infeasible. Unlike dynamics-based methods in which the size of the timestep used is limited by the stiffest part of the structure, our method is purely geometric and so the number of required animation frames is dictated only by the difference in shape between the two conformations. An added advantage to this kind of model is that having virtual bonds attaching each alpha -carbon to all those that are within a sequential distance of three units induces stiffnesses in the virtual bond angles and torsion angles while retaining the ease of using Cartesian coordinates.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

Incremental construction of intermediate conformations

We derive here an incremental formulation to generate intermediate conformations. The key idea is to interpolate between two values for the distances between spatially proximal alpha -carbons, which are artificially connected with springs in the elastic network model (Atilgan et al., 2001; Bahar et al., 1997). Although the relationship between molecular conformations and the distances between atoms in conformations has been studied extensively (Crippen and Havel, 1988), our goal is to generate intermediate conformations by finding small changes in alpha -carbon positions that result from inducing correspondingly small changes in inter-residue distances.

Suppose that we have sets of alpha -carbon coordinates for the two end conformations of the same protein denoted by {xi} and {chi i}, respectively. One can build two elastic network models, one for each of these conformations. We introduce a cost function as follows
C(<B><UP>&dgr;</UP></B>)=<FR><NU>1</NU><DE>2</DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n−1</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>n</UP></UL></LIM> k<SUB><UP>i,j</UP></SUB>{∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>+<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>∥−l<SUB><UP>i,j</UP></SUB>}<SUP>2</SUP>. (2)
Here delta  = [delta <UP><SUB>1</SUB><SUP>T</SUP></UP>, ... , delta <UP><SUB>n</SUB><SUP>T</SUP></UP>]T is a 3n-dimensional vector of displacements, with n being the number of residues. An intermediate conformation is defined by the value of delta  that minimizes this cost when all other parameters are held constant. The linking ("contact") matrix k is an n × n matrix containing the information about which amino acid residue is either connected to, or in contact with, any other. It is formed as the "union" of the two linking matrices for {xi} and {chi i}, so that ki,j can have value 1 whenever residues i and j are judged to be in contact in either conformation and 0 otherwise. The value li,j is the desired distance between i and j, which can be chosen as
l<SUB><UP>i,j</UP></SUB>=(1−&agr;)∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB><UP>∥+&agr;∥<B>&khgr;</B><SUB>i</SUB></UP>−<B><UP>&khgr;</UP></B><SUB><UP>j</UP></SUB>∥, (3)
where alpha  is the coefficient specifying how far a given state is along the transition from {xi} to {chi i}. For example, when alpha  = 0.5, the desired conformation is the one with inter-residue distances at the average values of those for conformations {xi} and {chi i}. Using the "union" linking matrices confines the intermediate conformations to the interval between the two end conformations.

The cost function in Eq. 2 can be related to the classical pairwise potential functions of biophysics in the following way. Suppose that a coarse-grained (C-alpha) potential function between any two residues i and j is Vi,j(∥xi - xj∥). If from the crystal data we know that there are two conformations, we seek to establish a series of "artificial" equilibrium states by perturbing this potential at artificial time t. This results in an artificial potential of the form
V(<B><UP>&dgr;</UP></B>, t)=<LIM><OP>∑</OP><LL><UP>i,j</UP></LL></LIM> V<SUB><UP>i,j</UP></SUB>(∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>+<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>∥−l<SUB><UP>i,j</UP></SUB>(t)). (4)
The goal at each value of time is then to let the system relax so that the values of the small displacements cause the new conformations to settle at new artificial equilibria. Because the i + 1st state is always calculated relative to the artificial equilibrium established for the ith state, a Taylor series expansion of each Vi,j(r) up to quadratic order will result in an expression of the form in Eq. 2. Here r is the inter-residue distance. The linear term in the Taylor series expansion
V<SUB><UP>i,j</UP></SUB>(r<SUB>0</SUB>+&egr;)≈V<SUB><UP>i,j</UP></SUB>(r<SUB>0</SUB>)+V<UP>′<SUB>i,j</SUB></UP>(r<SUB>0</SUB>)&egr;+½V<UP>″<SUB>i,j</SUB></UP>(r<SUB>0</SUB>)&egr;<SUP>2</SUP> (5)
(where r0 is the previous equilibrium value of inter-residue distance and epsilon  is the small change) vanishes when summed over all values of i and j because of the definition of an equilibrium state. Even if the potential function is singular, such as in a six-twelve potential, the Taylor series expansion in a small neighborhood of an equilibrium is valid because the function will always be analytic locally. Hence, starting from an arbitrary potential function, one will always arrive at Eq. 2 when small incremental deviations from equilibrium are made. The only influence the potential will have is on the values of ki,j, which are held constant over time and over all values of i and j for simplicity in the elastic network model, but could easily be allowed to vary within the framework given below to reflect any potential function.

Our goal is to find values of delta  that minimize Eq. 2, which itself can be linearized for small values of ∥delta i∥ and ∥delta j∥ with a Taylor series approximation. If we consider an individual term in Eq. 2
C<SUB><UP>i,j</UP></SUB>=½k<SUB><UP>i,j</UP></SUB>{∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>+<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>∥−l<SUB><UP>i,j</UP></SUB>}<SUP>2</SUP>, (6)
then this can be written as
C<SUB><UP>i,j</UP></SUB>≈½ k<SUB><UP>i,j</UP></SUB>(C<SUP>(<UP>1</UP>)</SUP><SUB><UP>i,j</UP></SUB>+C<SUP>(<UP>2</UP>)</SUP><SUB><UP>i,j</UP></SUB>+C<SUP>(<UP>3</UP>)</SUP><SUB><UP>i,j</UP></SUB>), (7)
where
C<SUP>(<UP>1</UP>)</SUP><SUB><UP>i,j</UP></SUB>=(<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>)<SUP><UP>T</UP></SUP><FENCE>𝔼<SUB>3</SUB>−l<SUB><UP>i,j</UP></SUB> <FR><NU>A(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)</NU><DE>∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>∥</DE></FR></FENCE>(<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>), (8)

C<SUP>(<UP>2</UP>)</SUP><SUB><UP>i,j</UP></SUB>=2<FENCE>1−<FR><NU>l<SUB><UP>i,j</UP></SUB></NU><DE>∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>∥</DE></FR></FENCE>(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)<SUP><UP>T</UP></SUP>(<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB>−<B><UP>&dgr;</UP></B><SUB><UP>j</UP></SUB>), (9)
and
C<SUP>(<UP>3</UP>)</SUP><SUB><UP>i,j</UP></SUB>=(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)<SUP><UP>T</UP></SUP>(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)−2l<SUB><UP>i,j</UP></SUB>∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>∥+l<SUP><UP>2</UP></SUP><SUB><UP>i,j</UP></SUB>. (10)
In Eq. 8 we use E3 to denote the 3 × 3 identity matrix, and
A(<B><UP>x</UP></B>)=𝔼<SUB>3</SUB>−<FR><NU><B><UP>xx</UP></B><SUP>T</SUP></NU><DE>∥<B><UP>x</UP></B>∥<SUP>2</SUP></DE></FR>.
If we take G'i,j is in  R3×3 such that
G<UP>′<SUB>i,j</SUB></UP>=k<SUB><UP>i,j</UP></SUB><FENCE>𝔼<SUB>3</SUB>−l<SUB><UP>i,j</UP></SUB> <FR><NU>A(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)</NU><DE>∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>∥</DE></FR></FENCE>, (11)
then
<FR><NU>1</NU><DE>2</DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n−1</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>n</UP></UL></LIM> k<SUB><UP>i,j</UP></SUB>C<SUP>(<UP>1</UP>)</SUP><SUB><UP>i,j</UP></SUB>=<FR><NU>1</NU><DE>2</DE></FR> <B><UP>&dgr;</UP></B><SUP><UP>T</UP></SUP>&Ggr;<B><UP>&dgr;</UP></B> (12)
for some 3n × 3n matrix Gamma , which can be divided into 3 × 3 blocks Gamma i,j. Generally, if i not equal  j,
&Ggr;<SUB><UP>i,j</UP></SUB>=<UP>−</UP>G<UP>′<SUB>i,j</SUB></UP>. (13)
When i = j, the result is
&Ggr;<SUB><UP>i,i</UP></SUB>=<LIM><OP>∑</OP><LL><UP>k=1</UP></LL><UL><UP>i−1</UP></UL></LIM> G<UP>′<SUB>k,i</SUB></UP>+<LIM><OP>∑</OP><LL><UP>k=i+1</UP></LL><UL><UP>n</UP></UL></LIM> G<UP>′<SUB>i,k</SUB></UP>=<LIM><OP>∑</OP><LL><UP>k≠i</UP></LL></LIM> G<UP>′<SUB>k,i</SUB></UP>. (14)
Let vi,j is in  R1×3 be
<B><UP>v</UP></B><SUB><UP>i,j</UP></SUB>=2k<SUB><UP>i,j</UP></SUB><FENCE>1−<FR><NU>l<SUB><UP>i,j</UP></SUB></NU><DE>∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>∥</DE></FR></FENCE>(<B><UP>x</UP></B><SUB><UP>i</UP></SUB>−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>)<SUP><UP>T</UP></SUP>. (15)
Then
<FR><NU>1</NU><DE>2</DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n−1</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>n</UP></UL></LIM> k<SUB><UP>i,j</UP></SUB>C<SUP>(<UP>2</UP>)</SUP><SUB><UP>i,j</UP></SUB>=<FR><NU>1</NU><DE>2</DE></FR><B><UP> &ggr;&dgr;</UP></B>, (16)
where
<B><UP>&ggr;</UP></B>=[<B><UP>&ggr;</UP></B><SUB>1</SUB>, <B><UP>&ggr;</UP></B><SUB>2</SUB>,…, <B><UP>&ggr;</UP></B><SUB><UP>n</UP></SUB>]∈ℝ<SUP><UP>1×3n</UP></SUP> (17)
and
<B><UP>&ggr;</UP></B><SUB><UP>i</UP></SUB>=<UP>−</UP><LIM><OP>∑</OP><LL><UP>k=1</UP></LL><UL><UP>i−1</UP></UL></LIM> <B><UP>v</UP></B><SUB><UP>k,i</UP></SUB>+<LIM><OP>∑</OP><LL><UP>k=i+1</UP></LL><UL><UP>n</UP></UL></LIM> <B><UP>v</UP></B><SUB><UP>i,k</UP></SUB>=<LIM><OP>∑</OP><LL><UP>k≠i</UP></LL></LIM><UP> <B>v</B><SUB>i,k</SUB></UP>. (18)
Let B be
B=<FR><NU>1</NU><DE>2</DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n−1</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>n</UP></UL></LIM> k<SUB><UP>i,j</UP></SUB>C<SUP>(<UP>3</UP>)</SUP><SUB><UP>i,j</UP></SUB>. (19)
When retaining terms to quadratic order, the result will have the form
C(<B><UP>&dgr;</UP></B>)=<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n−1</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>n</UP></UL></LIM> C<SUB><UP>i,j</UP></SUB>≈<FR><NU>1</NU><DE>2</DE></FR> <B><UP>&dgr;</UP></B><SUP><UP>T</UP></SUP>&Ggr;<B><UP>&dgr;</UP></B>+<FR><NU>1</NU><DE>2</DE></FR> <B><UP>&ggr;&dgr;</UP></B>+B, (20)
where B is a constant. The matrix Gamma  is a 3n × 3n matrix akin to a stiffness matrix that relates the relative cost of displacing any particular residue from its current position as compared to all other residues. We minimize C(delta ) with respect to delta , which results in the following constraint equation:
<FR><NU>∂C(<B><UP>&dgr;</UP></B>)</NU><DE>∂<B><UP>&dgr;</UP></B></DE></FR>=&Ggr;<B><UP>&dgr;</UP></B>+<FR><NU>1</NU><DE>2</DE></FR> <B><UP>&ggr;</UP></B><SUP><UP>T</UP></SUP>=0. (21)
We note that Gamma  is in  R3n×3n always has three zero eigenvalues corresponding to translation modes, because a translated version of delta  satisfying Eq. 21 can also minimize the cost function. That is, the solution of Eq. 21 is not unique. To solve this problem, one can either assume a particular point is fixed in space so that Gamma  can be reduced to a nonsingular (invertible) matrix, or add the constraint of linear momentum conservation such that
<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>n</UP></UL></LIM> m<SUB><UP>i</UP></SUB><UP><B>&dgr;</B><SUB>i</SUB></UP>=0. (22)
In this case we take mi = 1. Viewing Gamma  as a stiffness matrix, gamma  is like a generalized force that is used to push the system along the simplest realizable path connecting the two conformations.

Computational complexity

We have previously observed that the dynamic behavior and computational complexity of elastic network models of proteins vary with distance cutoff values defining interactions. Namely, large cutoff values yield increased numbers of interacting pairs. Consequently, the system becomes stiff, the amplitudes of fluctuations diminish with larger cutoff values, and the matrices describing the system become less sparse. Also for relatively short cutoff values, it is possible to get more than six zero eigenvalues corresponding to rigid-body modes in normal mode analysis, and there can be extremely large amplitude fluctuations along particular directions for particular residues (Atilgan et al., 2001). Likewise, our interpolation method, which is basically derived from a matrix similar to a stiffness matrix, is sensitive to cutoff values and the geometry of a given protein structure. Extremely short cutoff values will connect the residues only with their local neighbors. This can cause unrealistic results that lead to discontinuous animations. Adoption of larger cutoff values eliminates such behavior. However, denser linking matrices can increase tremendously the computation time required for generating intermediate transitions in large protein models composed of thousands of residues. The denser the matrix is, the longer the computation time is. Later we will demonstrate this relationship quantitatively. Fig. 2 illustrates how the sparsity pattern of the union linking matrices of lactoferrin (1LFG and 1LFH) depends on the cutoff values. When the cutoff value is 10 Å in (a), some residues have poor connections, so that the resulting pathway cannot be realistic. A larger cutoff value of 15 Å substantially increases the density of the linking matrix.



View larger version (34K):
[in this window]
[in a new window]
 
FIGURE 2   The sparsity patterns of the linking matrices. We apply two different cutoff methods to a model protein lactoferrin (1LFG and 1LFH). (a) The cutoff distance is 10 Å and this matrix contains 2.91% nonzero values. (b) The contact number of 20 is used as a cutoff regardless of neighbor distance. The density of this matrix is similar to the one based on a cutoff distance of 10 Å having a density of 3.17%, but it is more uniform and produces smoother transition pathway. (c) New connections are shown. (d) Broken connections are displayed.

We address a new way to produce uniformly sparse linking matrices. The method reduces computational costs for the whole interpolation process and also guarantees realistic results. For this purpose, a linking matrix can be created by imposing a cutoff on the number of residue contacts rather than on the cutoff distances. We can connect one residue to its neighboring residues by increasing the cutoff distance until the limiting number of contacts is reached, regardless of the actual distance of the last connection. This enables the linking matrix to remain sparse and uniformly dense because all residues will have the same number of connections. One can see that this method creates a suitable linking matrix based on a contact number of 20 with a sparseness resembling one based on a cutoff distance of 10 Å in Fig. 2, but which no longer has any weakly connected parts. This is a smoothed, more uniform representation of protein structure.

Visualization

Animations of conformational transitions are more comprehensible than a series of static pictures, and are particularly useful for teaching (Booth, 2001). We incrementally generate 99 transient conformations between the two end conformations using the present distance interpolation method. In the implementation, we calculate delta  to minimize our cost function in Eq. 2 when alpha  = 0.01. Then we obtain the first intermediate conformation denoted by {x<UP><SUB>i</SUB><SUP>1</SUP></UP>}, which is 1% along the path between {xi} and {chi i}. That is,
<B><UP>x</UP></B><SUP><UP>1</UP></SUP><SUB><UP>i</UP></SUB>=<B><UP>x</UP></B><SUB><UP>i</UP></SUB>+<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB> (23)
where x<UP><SUB>i</SUB><SUP>1</SUP></UP> is the position of the ith residue out of the set {x<UP><SUB>i</SUB><SUP>1</SUP></UP>}. Likewise for the next incremental conformation {x<UP><SUB>i</SUB><SUP>2</SUP></UP>},
<B><UP>x</UP></B><SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>=<B><UP>x</UP></B><SUP><UP>1</UP></SUP><SUB><UP>i</UP></SUB>+<B><UP>&dgr;</UP></B><SUB><UP>i</UP></SUB> (24)
where delta  is the solution of Eq. 21 when alpha  = 0.02 in the next incremental step. The remaining conformations are then obtained in this iterative way. We store them in pdb format and generate 3D pictures with Rasmol. These static pictures are used sequentially to create movies.

Our interpolation method concerns itself not with the absolute spatial positions of atoms, but instead with distances between interacting pairs. For this reason, sometimes the solved conformations starting from one conformation do not converge to the actual spatial position and orientation of the other conformation, even though the shape is sequentially interpolated quite well. We resolve this problem simply by doing a rigid-body superposition at each timestep.


    SIMULATION RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

To test our interpolation method, we choose several protein conformation pairs: lac repressor headpiece (1LCC and 1LCD), lactate dehydrogenase (1LDM and 6LDH), citrate synthase (4CTS and 1CTS), and lactoferrin (1LFG and 1LFH). Movies of the conformational transitions in these systems can be found at http://custer.me.jhu.edu/proteins/movies.html. Table 1 shows that the size of the protein and the density of the linking matrix are major determinants of computational time.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 1   Relationships between linking matrix density and computational efficiency for sample proteins

For small proteins it appears that a cutoff distance of 10 Å is sufficient to form a linking matrix for generating a feasible pathway, but the linking matrix of lactoferrin with the same cutoff distance creates unrealistic intermediate conformations between 1LFG and 1LFH. In such a case, poorly connected parts move discontinuously in Cartesian space during the transition. One can adopt larger cutoff distances to overcome this problem. However, the larger cutoff distances increase computation times sharply. Simulation with a cutoff distance of 15 Å for lactoferrin takes about five times as long as with a cutoff distance of 10 Å. Alternatively, our uniform and sparse linking matrix, generated with a contact number cutoff of 20, enables computing feasible pathways in relatively short times. This method not only generates feasible pathways more reliably but also runs faster.

Figs. 3 and 4 illustrate the conformational transition between the corresponding pair of "open" and "closed" crystallographic structures of lactate dehydrogenase and citrate synthase, respectively. Intermediate conformations obtained using the distance interpolation method developed in this paper give rise to feasible and continuous pathways.



View larger version (16K):
[in this window]
[in a new window]
 
FIGURE 3   Simulation of intermediate conformations between 1LDM and 6LDH of lactate dehydrogenase structures using the cutoff distance of 10 Å. A large conformational change is observed at the upper left, where a surface loop opens at the active site.



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 4   Simulation of intermediate conformations between 4CTS and 1CTS of citrate synthase structures using the cutoff distance of 10 Å. It consists of two domains with the active site between them. The small domain swings away from the large one to uncover the active site.

Our interpolation method reliably generates conformational transitions without steric obstructions for large protein pairs. Fig. 5 shows the simulation results for the conformational transition of lactoferrin, which consists of 691 residues. This simulation illustrates the movement from the closed (diferric) form (1LFG) to the open (apo) form (1LFH). Virtual bond angles and torsion angles are calculated for the intermediate conformations. Secondary structures of the protein move approximately as rigid bodies during the transition. Their virtual bond and torsion angles change little between the two end conformations. However, the dominant angle changes often occur near loops, which connect two secondary structures. In Fig. 6, dark parts of the structure represent the residues with torsion angles having the largest changes during the transition. Most of them, except Val-250, are located in loop regions. Val-250 and Thr-90 play an important role in the transition, acting as a hinge between the two subdomains at the bottom of the structure. Fig. 7 a shows the minimum distance between the closest pairs of alpha -carbons. Our interpolation method creates intermediate conformations without the severe steric clashes that occur with the simple method of interpolating over internal coordinates as shown in Fig. 1. RMS value measures the position error between corresponding alpha -carbons of the two conformations. Fig. 7 b displays RMS values of all intermediate conformations with respect to the initial conformation {xi}. They increase linearly and monotonically throughout the transition. Fig. 7 c shows the value of the cost function in Eq. 2. Fundamentally, the cost function is a geometric measure of how far the conformation is from the prespecified simplest path between conformations. By definition, the cost at the endpoints will be zero if the method has successfully generated a feasible pathway, and as the cost is a nonnegative quantity, it will always have a maximum located between the initial and final conformations. The density of linking matrices for all intermediate conformations is shown in Fig. 7 d when using a distance cutoff. The intermediate conformations are more flexible than the two end conformations in the context of an elastic protein model with a distance cutoff, whereas density is constant when using a number cutoff.



View larger version (68K):
[in this window]
[in a new window]
 
FIGURE 5   Simulation of intermediate conformations between lactoferrin forms 1LFG ("closed") and 1LFH ("open") using the contact number cutoff of 20. Here 99 intermediate conformations are obtained incrementally using our interpolation method, and 2 of these intermediate conformations (alpha  = 0.33, 0.66) are shown. This shows the movement of lactoferrin from the "closed" form to the "open" form.



View larger version (23K):
[in this window]
[in a new window]
 
FIGURE 6   Virtual torsion angle variation during transition of lactoferrin. Bold stripes represent the residues for which torsion angles significantly vary. Especially, Thr-90 and Val-250 residues act like hinges to open two subdomains at the bottom, as shown in Fig. 5. Large angle changes between the two end conformations appear primarily in loop structures.



View larger version (24K):
[in this window]
[in a new window]
 
FIGURE 7   Statistics of the conformational transition in lactoferrin. (a) The minimum distance between all possible pairs of alpha -carbons in intermediate conformations shows that our method observes steric constraints. (b) RMS measures the difference of position between corresponding alpha -carbons relative to the starting form. (c) The value of the cost function in Eq. 1 is shown. (d) Linking matrices of intermediate conformation are sparser than those of the two end conformations when using a distance cutoff.

We provide as a goal for the model a set of linearly interpolated distances between every pair of connected residues in Eq. 3. This is chosen because it is the simplest way, but this does not mean that the path that is actually generated by our method corresponds to linear interpolation of distances because of the cooperativity of the coupled set of springs. Conversely, if various nonlinear trajectories in inter-residue distances are specified, the cooperativity of the system can wash out deviations from the collective behavior. To illustrate this point, we examine an example in which the set of desired distances li,j are no longer linearly interpolated, but still have the same initial and end values as before. We generate sequences of coordinates using the new li,j functions (some of which are nonlinear functions), and apply our method. The shape deviations in the resulting conformations during the animation are compared to those obtained when using linear distance interpolation by RMS position error (Fig. 8). We apply a quadratic li,j to the two residue pairs having the largest distance changes during the transition with all the other li,j values driven linearly, as in Eq. 3. However, the two pairs that are specified to behave in a nonlinear way end up being forced by their surrounding to behave linearly, as shown in Fig. 9. The constraints from the surroundings do not allow them to trace the "desired" input paths. Furthermore, all of the RMS values we calculated in Fig. 8 are smaller than the resolution of the experimentally determined crystal structures themselves.



View larger version (17K):
[in this window]
[in a new window]
 
FIGURE 8   The comparison of simulation results using several different inputs in lactoferrin. (a) RMS values of the intermediate conformations generated by linear, quadratic, and cubic distance trajectory inputs with respect to the initial conformation {xi} are shown, respectively. (b) The shape deviations of the intermediate conformations calculated using nonlinear distance trajectories as inputs compared to those obtained when using linear interpolation is shown. The magnitudes of the variations are smaller than the resolution of the crystal structure of lactoferrin.



View larger version (14K):
[in this window]
[in a new window]
 
FIGURE 9   Simulation result in lactoferrin with mixed input such that most residue pairs are designated to follow the linear interpolation of distance, except two particular pairs that are driven by the quadratic input (solid-dot line). (a) The distance between Asn-13 and Gln-186 (solid line) appears to vary linearly, not following the input form. (b) Likewise, the distance between Asn-234 and Val-606 decreases linearly. This indicates that surrounding pairs force them to follow the global behavior with observation of sterics.

From the above discussion we conclude that for all practical purposes our method generates feasible pathways that are somewhat insensitive to the user-specified choice of li,j. That is, the global behavior of this coarse-grained model overrides particular user-specified parameters when those parameters deviate substantially from the collective behavior. Hence, as long as the overall trend is for inter-residue distance trajectories to follow a monotonic temporal path, our linear inputs are a reasonable choice. However, if one is not interested in the simplest feasible pathway, our method can still be useful. For instance, if one is interested in generating ensembles of paths rather than a single feasible path, our method can be run many times (serially or in parallel) to generate truly different pathways by varying the lij(t) relative to each other. Hence, the speed of our method has the potential to assist in the statistical mechanical analysis of protein conformational transitions, though that is not our emphasis here.

Incorporating partial conformational data

In this section we explain how our distance interpolation method can be used to incorporate incomplete conformational information obtained from experiments into computer simulations of protein motions. In instances when two crystallographic structures are not available, this application of our model will be of value. A number of experimental techniques are available for determining partial conformational data. One set of techniques is centered around the use of fluorescent energy transfer to capture a limited number of inter-residue distances in different conformations (Wu and Brand, 1994). Other techniques have been developed to mechanically manipulate large molecules directly and measure the history of the applied force (Bustamante et al., 2000). NMR can be used for determining time-resolved conformational data (Balbach et al., 1995; Dyson and Wright, 1996).

These experimental methods for determining conformational data generally do not provide as complete information as crystallography, but in some instances this is balanced by their ability to provide time-resolved information. Equation 2, which forms the basis for our method, applies in the context of interpolating between two crystal structures, and it also can serve as a tool for visualizing global protein motions that are consistent with time-resolved distance trajectories between two or more residues. In the context when a small subset of the li,j values can be determined experimentally as functions of time, these values can be directly substituted into Eq. 2. Then our formulation proceeds as before.

We now demonstrate how this incorporation of partial conformational data into our model can be done. Consider again lactoferrin, and assume that only the open crystallographic state is given. In this case the linking matrix for this state alone (and not the union of two linking matrices) is used for ki,j. Again, a value of one means two residues are in contact and a zero means that they are not. To test how well our method would work if a second incomplete set of inter-residue distances were determined experimentally, we sample a small subset of li,j values from the closed crystallized form of lactoferrin. We choose this subset of the li,j values to contain at least one set of six residue-residue pairs. We have divided the whole structure into three essentially rigid pieces and assumed seven experimentally determined li,j(t) values to be linear trajectories from the open to closed values in Fig. 10 a. Because most inter-residue distances are not specified for the second conformation, we use the simple update rule for those pairs that are not specified in the second conformation as follows
l<SUB><UP>i,j</UP></SUB>(t+&Dgr;t)=<FENCE><AR><R><C><UP>∥<B>x</B><SUB>i</SUB></UP>(t)−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>(t)∥</C><C><UP>if</UP> ∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>(t)−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>(t)∥>l<SUB><UP>min</UP></SUB></C></R><R><C><UP>l<SUB>min</SUB></UP></C><C><UP>if</UP> ∥<B><UP>x</UP></B><SUB><UP>i</UP></SUB>(t)−<B><UP>x</UP></B><SUB><UP>j</UP></SUB>(t)∥≤l<SUB><UP>min</UP></SUB>,</C></R></AR></FENCE> (25)
where we set lmin = 3.8 Å.



View larger version (22K):
[in this window]
[in a new window]
 
FIGURE 10   A schematic diagram of lactoferrin structure and the "windowed RMS" values of the simulation results from a single complete set of crystallographic data and limited amounts of time-resolved data. (a) The open lactoferrin structure is simplified as three rigid body pieces and it is assumed that seven new inter-residue distances are measured for a second conformation. The li,j(t) values that describe the new conformation are enforced by direct substitution into Eq. 2 with a large value of that particular ki,j of 100, and the same ki,j value is applied to the connections within each rigid body component. (b) The RMS difference between the end conformation generated in this way, {x<UP><SUB>i</SUB><SUP>100</SUP></UP>}, and the targeted conformation, {chi i}, is only 2.5 Å. The windowed RMS plots consecutively capture 70 residues per window. The solid line stands for the windowed RMS value of {xi} relative to {chi i}, while the solid-dot line indicates the windowed RMS value between {x<UP><SUB>i</SUB><SUP>100</SUP></UP>} and {chi i}. Our distance interpolation method with limited secondary conformational data captures the global behavior of lactoferrin's conformational transition well.

This allows any strain between intermediate and initial conformations to relax unless it results in steric clashes. The RMS difference between the end conformation generated in this way with the targeted crystallographic data is small, as shown in Fig. 10 b, indicating that the protein's cooperativity is captured well using our incremental distance interpolation method, and that much can be inferred about global protein motions from a single complete set of crystallographic data and limited amounts of partial data that describe a second conformation. Hence, our framework may be used as a visualization tool for experimentalists to superimpose partial conformational data onto crystal structures to examine the structural/kinematic implications of measured inter-residue distances.


    CONCLUSIONS
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

We have developed a computationally efficient method for the realistic simulation of proteins exhibiting transitions between two crystallized conformations. Our method is also flexible and general enough to incorporate partially observed, time-dependent conformational data from experiments. It is based on a coarse-grained elastic network model. Using cutoffs in the number of nearest neighbors generates a linking matrix that is both sparse and uniformly dense, hence permitting efficient computations. Our distance interpolation method is the fastest method available for generating conformational transitions while still preserving steric constraints. This is because it involves only one inversion of a very sparse 3n × 3n matrix for each frame in the animation, where n is the number of amino acid residues. Unlike dynamics-based methods in which the size of the timestep used is limited by the stiffest part of the structure, our method is purely geometric, and so the number of required animation frames is dictated only by the difference in shape between the two conformations. Typically, we choose 99 intermediate frames, whereas dynamics-based approaches must use several orders of magnitude more timesteps to achieve conformational transitions. Our method also has the benefit that it is not sensitive to parameter choices, solvation, or quantum mechanical effects, which is not the case with the most physically detailed models. While physical reality is the ultimate goal of all computational modeling methods, we believe that having a method effective for proteins of several hundred residues with an accessible PC in a few hours may be preferable to more "realistic" techniques requiring months of high-performance computer time, and are not guaranteed to converge due to round-off errors, instability of numerical integration, or even a lack of full knowledge about the true nature of the chemical potentials involved in proteins.

Simulation results illustrate that the distance interpolation method presented here reliably generates sequences of feasible intermediate conformations of proteins without steric clashes. Animations produced using this method are posted at http://custer.me.jhu.edu/proteins/movies.html. The distance interpolation method represents an improvement over simplified linear position interpolations in terms of the realism of intermediate forms, and over all-atom computational methods such as MD and NMA, in terms of computational efficiency.

    FOOTNOTES

Address reprint requests to Gregory S. Chirikjian, Dept. of Mechanical Engineering, The Johns Hopkins University, Baltimore, MD 21218. E-mail: gregc{at}jhu.edu.

Submitted January 3, 2002, and accepted for publication May 7, 2002.

R. L. Jernigan's present address is Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011-3020.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
SIMULATION RESULTS
CONCLUSIONS
REFERENCES

Biophys J, September 2002, p. 1620-1630, Vol. 83, No. 3
© 2002 by the Biophysical Society   0006-3495/02/09/1620/11  $2.00



This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
E. Lindahl, C. Azuara, P. Koehl, and M. Delarue
NOMAD-Ref: visualization, deformation and refinement of macromolecular structures based on all-atom normal mode analysis.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W52 - W56.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Jang, J. I. Jeong, and M. K. Kim
UMMS: constrained harmonic and anharmonic analyses of macromolecules based on elastic network models.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W57 - W62.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
W. Zheng and B. R. Brooks
Modeling Protein Conformational Changes by Iterative Fitting of Distance Constraints Using Reoriented Normal Modes
Biophys. J., June 15, 2006; 90(12): 4327 - 4336.
[Abstract] [Full Text] [PDF]