We develop a computationally efficient method to simulate
the transition of a protein between two conformations. Our method is
based on a coarse-grained elastic network model in which distances between spatially proximal amino acids are interpolated between the
values specified by the two end conformations. The computational speed
of this method depends strongly on the choice of cutoff distance used
to define interactions as measured by the density of entries of the
constant linking/contact matrix. To circumvent this problem we
introduce the concept of using a cutoff based on a maximum number of
nearest neighbors. This generates linking matrices that are both sparse
and uniform, hence allowing for efficient computations that are
independent of the arbitrariness of cutoff distance choices. Simulation
results demonstrate that the method developed here reliably generates
feasible intermediate conformations, because our method observes steric
constraints and produces monotonic changes in virtual bond and torsion
angles. Applications are readily made to large proteins, and we
demonstrate our method on lactate dehydrogenase, citrate synthase, and
lactoferrin. We also illustrate how this framework can be used to
complement experimental techniques that partially observe protein motions.
 |
INTRODUCTION |
Proteins are well known to be intrinsically
flexible structures. Many proteins have been determined to have
multiple conformations (in some cases called "open" and
"closed" forms; Berman et al., 2000
). Conformational transitions
between two forms are often important for understanding the
relationship between structure and function. In other words, such
motions are involved in many basic functions such as catalysis,
regulation, transportation, and aggregation (Subbiah, 1996
). Hence,
comprehending conformational transitions can be useful for
understanding biological mechanisms, especially for protein machines.
However, it is also an important topic in molecular graphics to
visualize conformational transitions. Obviously, one of the best ways
is through animations, such as digital movie files (e.g., AVI or MPEG).
Animations usually are produced by inserting images of intermediate
conformations between the two conformations. These hypothetical
intermediate conformations are visualized in sequence for an animation.
There have been several previous efforts in this area. Vonrhein et al.
(1995)
produced movies of conformational transitions by linear
interpolation between the atomic coordinates of the two end
conformations in Cartesian space. One problem with that method is that
the bond lengths and angles of the intermediate conformations can be
unrealistic, and in several cases protein chains actually pass through
one another. To overcome this problem, Gerstein and Krebs (1998)
applied proper restraints and minimized the energy of each intermediate
conformation to correct for molecular stereochemistry and enforce rules
of molecular structure.
An alternative interpolation approach is to use internal coordinates
such as bond lengths, bond angles, and torsion angles instead. Kleywegt
and Jones (1996)
implemented this approach to construct intermediate
conformations with their LSQMAN program. Ideally, this approach
produces realistic bond lengths and torsional angles, but this method
also has some problems. If one constructs intermediate conformations by
interpolating torsional angles between two end conformations while
holding bond lengths and angles fixed, one will often obtain infeasible
pathways for several reasons. First, it may not even be possible for
the conformation from one end to reach the other end in Cartesian space
because the two conformations do not have identical values of internal
variables such as bond lengths and bond angles. Therefore, one must
either refine the two end conformations until they have a consistent set of internal variables, except for the torsion angles, before interpolating over torsion angles, or instead interpolate all of the
internal variables simultaneously to avoid this problem. A second
limitation is that in the process of generating intermediate conformations some residues can come too close to each other in order
to not break the smoothness of the simulated pathway and this can
produce unfavorable states in the sense of high-energy interactions and
steric clashes. Fig. 1 shows that a
particular pair of
-carbons can come too close to each other during
conformational transitions using internal coordinate interpolation,
which would give rise to exceedingly high repulsive energy peaks
because of van der Waals forces between nonbonded atoms. A third
problem occurs for specific values of internal rotation angles. Most
are not equi-energetic for all values. Consequently, some forms have higher energies, and the intermediate forms generated could have inordinately high values, even if other lower energy pathways do exist.

View larger version (15K):
[in this window]
[in a new window]
|
FIGURE 1
An example of torsional angle interpolation between two
lac repressor headpiece structures (named 1LCC and 1LCD, Protein Data
Bank). (a) During the conformational transition from 1LCC to
1LCD, the -carbons of Val-15 and Thr-34 unrealistically come too
close to each other, 1 Å. (b) This unrealistic relative
distance between two atoms (solid-dot line) is compared to
the result of the "distance interpolation method" developed in this
paper (solid line). The conformation from one end does not
reach the other end in Cartesian space because the two sets of
crystallographic coordinates do not yield identical values of bond
lengths and bond angles. One must either refine the two end
conformations until they have a consistent set of internal variables,
except for the torsion angles, before interpolating over torsion
angles, or interpolate the full set of internal variables
simultaneously. However, our method appears to conform well to steric
constraints when reaching the other end without refinement of internal
variables.
|
|
There has been substantial work on the development of algorithms to
generate plausible "reaction pathways." Elber and Karplus (1987)
proposed to make a trial path connecting reactant and product structures. This path minimizes the functional
|
(1)
|
where l(r) is the reaction path connecting
reactant (rR) and product
(rP) structures, U(r) is the
potential energy, and L =
dl(r) is the path
length (Czerminski and Elber, 1990
). This algorithm demonstrates a good reaction path for the alanine dipeptide and the tetrapeptide IAN. However, the path that minimizes the functional above may not be the
path with the maximum rate of transition between reactants and
products. Jónsson et al. (1998)
developed a "nudged elastic band method," which is a modified path method to find minimum energy
paths by constructing a set of intermediate conformations between
reactant and product structures. A spring interaction between adjacent
conformations is added to ensure the continuity of the path.
Minimization of the force acting on the conformations yields the
minimum energy path. Another path method is the MaxFlux method proposed
by Huo et al. (1997)
. It computes the reaction pathway of
maximum diffusive flux by minimizing a discretized form of the line
integral in Eq. 1 with added restraints such as constant mean-square
distance between adjacent intermediates, repulsive interactions between
adjacent intermediates along the path, and linear and angular momentum
conservation for the system. This method was used to find the optimal
pathway of the coil-to-helix transition in a short polyalanine chain,
but these path methods have not been applied to a large protein because
it is a computationally demanding task to find the global minimum value
of the objective function in a high-dimensional space out of all
possible reaction paths.
Some methods have considered probabilistic models of pathways. Olender
and Elber (1996)
integrated classical Newtonian dynamical equations of
motion to compute long-time molecular dynamics trajectories based on
the stochastic path integral. The activated dynamical transition path
method developed by Dellago et al. (1997)
generates and samples
an ensemble of transition paths, which evolve according to stochastic
dynamics (either Metropolis Monte Carlo or Brownian dynamics) and
conserve the Boltzmann distribution. Again, these stochastic methods
have been tested for simple cases such as the alanine dipeptide and
two-dimensional Lennard-Jones clusters, but they also are
computationally too expensive to be applied to a large protein.
Molecular dynamics (MD) simulations, a powerful method for the study of
details of molecular motion, and normal mode analysis (NMA) using
all-atom empirical potentials, are often used to follow the dynamics of
proteins (Brooks and Karplus, 1985
; Xu et al., 1997
; Xu and Sigler,
1998
). However, the use of atomic approaches becomes computationally
inefficient with the increased size of a system.
To reduce such a computational burden, many recent papers have
demonstrated the utility of coarse-grained protein models by including,
for example, only
-carbons as point masses representing residues and
using a simplified potential for considering internal interactions
between neighboring residues. Such models are suitable to describe the
global motions of complex systems of small proteins or single proteins
having more than several thousand residues (Atilgan et al., 2001
; Bahar
and Jernigan, 1998
; Bahar et al., 1999
; Jaaskelainen et al., 1998
; Tama
and Sanejouand, 2001
; Tirion and Ben-Avraham, 1993
, 1998
).
In this paper we develop a new interpolation method for generating
feasible pathways for conformational transitions using the simplest
potential and coarse-grained protein models. The key idea is to
interpolate uniformly the distances between spatially proximal residues
in both conformations within the context of the elastic network model.
The present approach can be referred to as a "distance interpolation
method," which is completely different from position interpolation in
Cartesian space. Because we interpolate relative distance between
spatially close residues, unrealistic conformations and steric clashes
become far less likely. This method offers a reasonable compromise
between oversimplified linear interpolation in Cartesian or internal
coordinates and computationally expensive methods such as MD
simulations. We discuss this efficient modeling technique for large
proteins. Our method computes sets of interpolated conformations within
reasonable times in cases where full-atom computations such as MD or
even NMA may be infeasible. Unlike dynamics-based methods in which the
size of the timestep used is limited by the stiffest part of the
structure, our method is purely geometric and so the number of required
animation frames is dictated only by the difference in shape between
the two conformations. An added advantage to this kind of model is
that having virtual bonds attaching each
-carbon to all those
that are within a sequential distance of three units induces
stiffnesses in the virtual bond angles and torsion angles while
retaining the ease of using Cartesian coordinates.
 |
METHODS |
Incremental construction of intermediate conformations
We derive here an incremental formulation to generate
intermediate conformations. The key idea is to interpolate between two values for the distances between spatially proximal
-carbons, which
are artificially connected with springs in the elastic network model
(Atilgan et al., 2001
; Bahar et al., 1997
). Although the relationship between molecular conformations and the distances between
atoms in conformations has been studied extensively (Crippen and Havel,
1988
), our goal is to generate intermediate conformations by finding
small changes in
-carbon positions that result from inducing
correspondingly small changes in inter-residue distances.
Suppose that we have sets of
-carbon coordinates for the two end
conformations of the same protein denoted by
{xi} and {
i},
respectively. One can build two elastic network models, one for each of
these conformations. We introduce a cost function as follows
|
(2)
|
Here
= [
, ... ,

]T is a
3n-dimensional vector of displacements, with n
being the number of residues. An intermediate conformation is defined
by the value of
that minimizes this cost when all other parameters are held constant. The linking ("contact") matrix
k is an n × n matrix containing the
information about which amino acid residue is either connected to, or
in contact with, any other. It is formed as the "union" of the two
linking matrices for {xi} and
{
i}, so that ki,j
can have value 1 whenever residues i and j are
judged to be in contact in either conformation and 0 otherwise. The
value li,j is the desired distance between
i and j, which can be chosen as
|
(3)
|
where
is the coefficient specifying how far a given state is
along the transition from {xi} to
{
i}. For example, when
= 0.5, the desired conformation is the one with inter-residue distances at the
average values of those for conformations {xi} and {
i}.
Using the "union" linking matrices confines the intermediate
conformations to the interval between the two end conformations.
The cost function in Eq. 2 can be related to the classical pairwise
potential functions of biophysics in the following way. Suppose that a
coarse-grained (C-alpha) potential function between any two residues
i and j is
Vi,j(
xi
xj
). If from the crystal data we know that
there are two conformations, we seek to establish a series of
"artificial" equilibrium states by perturbing this potential at
artificial time t. This results in an artificial potential
of the form
|
(4)
|
The goal at each value of time is then to let the system relax
so that the values of the small displacements cause the new conformations to settle at new artificial equilibria. Because the
i + 1st state is always calculated relative to the
artificial equilibrium established for the ith state, a
Taylor series expansion of each
Vi,j(r) up to quadratic order will
result in an expression of the form in Eq. 2. Here r is the
inter-residue distance. The linear term in the Taylor series expansion
|
(5)
|
(where r0 is the previous equilibrium
value of inter-residue distance and
is the small change) vanishes
when summed over all values of i and j because of
the definition of an equilibrium state. Even if the potential function
is singular, such as in a six-twelve potential, the Taylor series
expansion in a small neighborhood of an equilibrium is valid because
the function will always be analytic locally. Hence, starting from an
arbitrary potential function, one will always arrive at Eq. 2 when
small incremental deviations from equilibrium are made. The only
influence the potential will have is on the values of
ki,j, which are held constant over time and over
all values of i and j for simplicity in the
elastic network model, but could easily be allowed to vary within the
framework given below to reflect any potential function.
Our goal is to find values of
that minimize Eq. 2, which
itself can be linearized for small values of

i
and 
j
with a Taylor series approximation. If we consider an individual term
in Eq. 2
|
(6)
|
then this can be written as
|
(7)
|
where
|
(8)
|
|
(9)
|
and
|
(10)
|
In Eq. 8 we use
3 to denote
the 3 × 3 identity matrix, and
If we take G'i,j
3×3 such that
|
(11)
|
then
|
(12)
|
for some 3n × 3n matrix
, which can be divided
into 3 × 3 blocks
i,j. Generally, if i
j,
|
(13)
|
When i = j, the result is
|
(14)
|
Let vi,j
1×3 be
|
(15)
|
Then
|
(16)
|
where
|
(17)
|
and
|
(18)
|
Let B be
|
(19)
|
When retaining terms to quadratic order, the result will have
the form
|
(20)
|
where B is a constant. The matrix
is a
3n × 3n matrix akin to a stiffness matrix that relates
the relative cost of displacing any particular residue from its current
position as compared to all other residues. We minimize
C(
) with respect to
, which results
in the following constraint equation:
|
(21)
|
We note that
3n×3n
always has three zero eigenvalues corresponding to translation modes,
because a translated version of
satisfying Eq. 21 can
also minimize the cost function. That is, the solution of Eq. 21 is not
unique. To solve this problem, one can either assume a particular point
is fixed in space so that
can be reduced to a nonsingular
(invertible) matrix, or add the constraint of linear momentum
conservation such that
|
(22)
|
In this case we take mi = 1. Viewing
as a stiffness matrix,
is like a generalized force that is used
to push the system along the simplest realizable path connecting the
two conformations.
Computational complexity
We have previously observed that the dynamic behavior and
computational complexity of elastic network models of proteins vary with distance cutoff values defining interactions. Namely, large cutoff
values yield increased numbers of interacting pairs. Consequently, the
system becomes stiff, the amplitudes of fluctuations diminish with
larger cutoff values, and the matrices describing the system become
less sparse. Also for relatively short cutoff values, it is possible to
get more than six zero eigenvalues corresponding to rigid-body modes in
normal mode analysis, and there can be extremely large amplitude
fluctuations along particular directions for particular residues
(Atilgan et al., 2001
). Likewise, our interpolation method, which is
basically derived from a matrix similar to a stiffness matrix, is
sensitive to cutoff values and the geometry of a given protein
structure. Extremely short cutoff values will connect the residues only
with their local neighbors. This can cause unrealistic results that
lead to discontinuous animations. Adoption of larger cutoff values
eliminates such behavior. However, denser linking matrices can increase
tremendously the computation time required for generating intermediate
transitions in large protein models composed of thousands of residues.
The denser the matrix is, the longer the computation time is. Later we
will demonstrate this relationship quantitatively. Fig.
2 illustrates how the sparsity pattern of
the union linking matrices of lactoferrin (1LFG and 1LFH) depends on
the cutoff values. When the cutoff value is 10 Å in (a),
some residues have poor connections, so that the resulting pathway
cannot be realistic. A larger cutoff value of 15 Å substantially
increases the density of the linking matrix.

View larger version (34K):
[in this window]
[in a new window]
|
FIGURE 2
The sparsity patterns of the linking matrices. We apply
two different cutoff methods to a model protein lactoferrin (1LFG and
1LFH). (a) The cutoff distance is 10 Å and this matrix
contains 2.91% nonzero values. (b) The contact number of 20 is used as a cutoff regardless of neighbor distance. The density of
this matrix is similar to the one based on a cutoff distance of 10 Å having a density of 3.17%, but it is more uniform and produces
smoother transition pathway. (c) New connections are shown.
(d) Broken connections are displayed.
|
|
We address a new way to produce uniformly sparse linking matrices. The
method reduces computational costs for the whole interpolation process
and also guarantees realistic results. For this purpose, a linking
matrix can be created by imposing a cutoff on the number of residue
contacts rather than on the cutoff distances. We can connect one
residue to its neighboring residues by increasing the cutoff distance
until the limiting number of contacts is reached, regardless of the
actual distance of the last connection. This enables the linking matrix
to remain sparse and uniformly dense because all residues will have the
same number of connections. One can see that this method creates a
suitable linking matrix based on a contact number of 20 with a
sparseness resembling one based on a cutoff distance of 10 Å in Fig.
2, but which no longer has any weakly connected parts. This is a
smoothed, more uniform representation of protein structure.
Visualization
Animations of conformational transitions are more comprehensible
than a series of static pictures, and are particularly useful for
teaching (Booth, 2001
). We incrementally generate 99 transient conformations between the two end conformations using the present distance interpolation method. In the implementation, we calculate
to minimize our cost function in Eq. 2 when
= 0.01. Then we obtain the first intermediate conformation denoted by {x
}, which is 1% along the path
between {xi} and
{
i}. That is,
|
(23)
|
where x
is the position of the
ith residue out of the set
{x
}. Likewise for the next incremental
conformation {x
},
|
(24)
|
where
is the solution of Eq. 21 when
= 0.02 in the next incremental step. The remaining conformations are then obtained in this iterative way. We store them in pdb format and generate 3D pictures with Rasmol. These static pictures are used sequentially to create movies.
Our interpolation method concerns itself not with the absolute spatial
positions of atoms, but instead with distances between interacting
pairs. For this reason, sometimes the solved conformations starting
from one conformation do not converge to the actual spatial position
and orientation of the other conformation, even though the shape is
sequentially interpolated quite well. We resolve this problem simply by
doing a rigid-body superposition at each timestep.
 |
SIMULATION RESULTS |
To test our interpolation method, we choose several protein
conformation pairs: lac repressor headpiece (1LCC and 1LCD), lactate dehydrogenase (1LDM and 6LDH), citrate synthase (4CTS and 1CTS), and
lactoferrin (1LFG and 1LFH). Movies of the conformational transitions
in these systems can be found at
http://custer.me.jhu.edu/proteins/movies.html. Table
1 shows that the size of the protein and
the density of the linking matrix are major determinants of
computational time.
For small proteins it appears that a cutoff distance of 10 Å is
sufficient to form a linking matrix for generating a feasible pathway,
but the linking matrix of lactoferrin with the same cutoff distance
creates unrealistic intermediate conformations between 1LFG and 1LFH.
In such a case, poorly connected parts move discontinuously in
Cartesian space during the transition. One can adopt larger cutoff
distances to overcome this problem. However, the larger cutoff
distances increase computation times sharply. Simulation with a
cutoff distance of 15 Å for lactoferrin takes about five times as long
as with a cutoff distance of 10 Å. Alternatively, our uniform and
sparse linking matrix, generated with a contact number cutoff of 20, enables computing feasible pathways in relatively short times. This
method not only generates feasible pathways more reliably but also runs faster.
Figs. 3 and
4 illustrate the conformational
transition between the corresponding pair of "open" and
"closed" crystallographic structures of lactate dehydrogenase and
citrate synthase, respectively. Intermediate conformations obtained
using the distance interpolation method developed in this paper give
rise to feasible and continuous pathways.

View larger version (16K):
[in this window]
[in a new window]
|
FIGURE 3
Simulation of intermediate conformations between 1LDM
and 6LDH of lactate dehydrogenase structures using the cutoff distance
of 10 Å. A large conformational change is observed at the upper left,
where a surface loop opens at the active site.
|
|

View larger version (18K):
[in this window]
[in a new window]
|
FIGURE 4
Simulation of intermediate conformations between 4CTS
and 1CTS of citrate synthase structures using the cutoff distance of 10 Å. It consists of two domains with the active site between them. The
small domain swings away from the large one to uncover the active
site.
|
|
Our interpolation method reliably generates conformational transitions
without steric obstructions for large protein pairs. Fig.
5 shows the simulation results for the
conformational transition of lactoferrin, which consists of 691 residues. This simulation illustrates the movement from the closed
(diferric) form (1LFG) to the open (apo) form (1LFH). Virtual bond
angles and torsion angles are calculated for the intermediate
conformations. Secondary structures of the protein move approximately
as rigid bodies during the transition. Their virtual bond and torsion
angles change little between the two end conformations. However, the
dominant angle changes often occur near loops, which connect two
secondary structures. In Fig. 6, dark
parts of the structure represent the residues with torsion angles
having the largest changes during the transition. Most of them, except
Val-250, are located in loop regions. Val-250 and Thr-90 play an
important role in the transition, acting as a hinge between the two
subdomains at the bottom of the structure. Fig. 7
a shows the minimum distance
between the closest pairs of
-carbons. Our interpolation method
creates intermediate conformations without the severe steric clashes
that occur with the simple method of interpolating over internal
coordinates as shown in Fig. 1. RMS value measures the position error
between corresponding
-carbons of the two conformations. Fig. 7
b displays RMS values of all intermediate conformations with
respect to the initial conformation {xi}.
They increase linearly and monotonically throughout the transition.
Fig. 7 c shows the value of the cost function in Eq. 2.
Fundamentally, the cost function is a geometric measure of how far the
conformation is from the prespecified simplest path between
conformations. By definition, the cost at the endpoints will be zero if
the method has successfully generated a feasible pathway, and as the
cost is a nonnegative quantity, it will always have a maximum located
between the initial and final conformations. The density of linking
matrices for all intermediate conformations is shown in Fig. 7
d when using a distance cutoff. The intermediate conformations are more flexible than the two end conformations in the
context of an elastic protein model with a distance cutoff, whereas
density is constant when using a number cutoff.

View larger version (68K):
[in this window]
[in a new window]
|
FIGURE 5
Simulation of intermediate conformations between
lactoferrin forms 1LFG ("closed") and 1LFH ("open") using the
contact number cutoff of 20. Here 99 intermediate conformations are
obtained incrementally using our interpolation method, and 2 of these
intermediate conformations ( = 0.33, 0.66) are shown. This
shows the movement of lactoferrin from the "closed" form to the
"open" form.
|
|

View larger version (23K):
[in this window]
[in a new window]
|
FIGURE 6
Virtual torsion angle variation during transition of
lactoferrin. Bold stripes represent the residues for which torsion
angles significantly vary. Especially, Thr-90 and Val-250 residues act
like hinges to open two subdomains at the bottom, as shown in Fig. 5.
Large angle changes between the two end conformations appear primarily
in loop structures.
|
|

View larger version (24K):
[in this window]
[in a new window]
|
FIGURE 7
Statistics of the conformational transition in
lactoferrin. (a) The minimum distance between all possible
pairs of -carbons in intermediate conformations shows that our
method observes steric constraints. (b) RMS measures the
difference of position between corresponding -carbons relative to
the starting form. (c) The value of the cost function in Eq. 1 is shown. (d) Linking matrices of intermediate
conformation are sparser than those of the two end conformations when
using a distance cutoff.
|
|
We provide as a goal for the model a set of linearly interpolated
distances between every pair of connected residues in Eq. 3. This is
chosen because it is the simplest way, but this does not mean that the
path that is actually generated by our method corresponds to linear
interpolation of distances because of the cooperativity of the coupled
set of springs. Conversely, if various nonlinear trajectories in
inter-residue distances are specified, the cooperativity of the system
can wash out deviations from the collective behavior. To illustrate
this point, we examine an example in which the set of desired distances
li,j are no longer linearly interpolated, but
still have the same initial and end values as before. We generate
sequences of coordinates using the new li,j functions (some of which are nonlinear functions), and apply our method. The shape deviations in the resulting conformations during the
animation are compared to those obtained when using linear distance
interpolation by RMS position error (Fig.
8). We apply a quadratic
li,j to the two residue pairs having the largest
distance changes during the transition with all the other
li,j values driven linearly, as in Eq. 3.
However, the two pairs that are specified to behave in a nonlinear way
end up being forced by their surrounding to behave linearly, as shown
in Fig. 9. The constraints from the surroundings do not allow them to trace the "desired" input paths. Furthermore, all of the RMS values we calculated in Fig. 8 are smaller
than the resolution of the experimentally determined crystal structures
themselves.

View larger version (17K):
[in this window]
[in a new window]
|
FIGURE 8
The comparison of simulation results using several
different inputs in lactoferrin. (a) RMS values of the
intermediate conformations generated by linear, quadratic, and cubic
distance trajectory inputs with respect to the initial conformation
{xi} are shown, respectively. (b)
The shape deviations of the intermediate conformations calculated using
nonlinear distance trajectories as inputs compared to those obtained
when using linear interpolation is shown. The magnitudes of the
variations are smaller than the resolution of the crystal structure of
lactoferrin.
|
|

View larger version (14K):
[in this window]
[in a new window]
|
FIGURE 9
Simulation result in lactoferrin with mixed input such
that most residue pairs are designated to follow the linear
interpolation of distance, except two particular pairs that are driven
by the quadratic input (solid-dot line). (a) The
distance between Asn-13 and Gln-186 (solid line) appears to
vary linearly, not following the input form. (b) Likewise,
the distance between Asn-234 and Val-606 decreases linearly. This
indicates that surrounding pairs force them to follow the global
behavior with observation of sterics.
|
|
From the above discussion we conclude that for all practical purposes
our method generates feasible pathways that are somewhat insensitive to
the user-specified choice of li,j. That is, the global behavior of this coarse-grained model overrides particular user-specified parameters when those parameters deviate substantially from the collective behavior. Hence, as long as the overall trend is
for inter-residue distance trajectories to follow a monotonic temporal
path, our linear inputs are a reasonable choice. However, if one is not
interested in the simplest feasible pathway, our method can still be
useful. For instance, if one is interested in generating ensembles of
paths rather than a single feasible path, our method can be run many
times (serially or in parallel) to generate truly different pathways by
varying the lij(t) relative to each
other. Hence, the speed of our method has the potential to assist in
the statistical mechanical analysis of protein conformational transitions, though that is not our emphasis here.
Incorporating partial conformational data
In this section we explain how our distance interpolation method
can be used to incorporate incomplete conformational information obtained from experiments into computer simulations of protein motions.
In instances when two crystallographic structures are not available,
this application of our model will be of value. A number of
experimental techniques are available for determining partial
conformational data. One set of techniques is centered around the use
of fluorescent energy transfer to capture a limited number of
inter-residue distances in different conformations (Wu and Brand,
1994
). Other techniques have been developed to mechanically manipulate
large molecules directly and measure the history of the applied force
(Bustamante et al., 2000
). NMR can be used for determining
time-resolved conformational data (Balbach et al., 1995
; Dyson and
Wright, 1996
).
These experimental methods for determining conformational data
generally do not provide as complete information as crystallography, but in some instances this is balanced by their ability to provide time-resolved information. Equation 2, which forms the basis for our
method, applies in the context of interpolating between two crystal
structures, and it also can serve as a tool for visualizing global
protein motions that are consistent with time-resolved distance
trajectories between two or more residues. In the context when a small
subset of the li,j values can be determined
experimentally as functions of time, these values can be directly
substituted into Eq. 2. Then our formulation proceeds as before.
We now demonstrate how this incorporation of partial conformational
data into our model can be done. Consider again lactoferrin, and assume
that only the open crystallographic state is given. In this case the
linking matrix for this state alone (and not the union of two linking
matrices) is used for ki,j. Again, a value of
one means two residues are in contact and a zero means that they are
not. To test how well our method would work if a second incomplete set
of inter-residue distances were determined experimentally, we sample a
small subset of li,j values from the closed
crystallized form of lactoferrin. We choose this subset of the
li,j values to contain at least one set of six
residue-residue pairs. We have divided the whole structure into three
essentially rigid pieces and assumed seven experimentally determined
li,j(t) values to be linear
trajectories from the open to closed values in Fig. 10
a. Because most
inter-residue distances are not specified for the second conformation,
we use the simple update rule for those pairs that are not specified in
the second conformation as follows
|
(25)
|
where we set lmin = 3.8 Å.

View larger version (22K):
[in this window]
[in a new window]
|
FIGURE 10
A schematic diagram of lactoferrin structure and the
"windowed RMS" values of the simulation results from a single
complete set of crystallographic data and limited amounts of
time-resolved data. (a) The open lactoferrin structure is
simplified as three rigid body pieces and it is assumed that seven new
inter-residue distances are measured for a second conformation. The
li,j(t) values that describe the new
conformation are enforced by direct substitution into Eq. 2 with a
large value of that particular ki,j of 100, and
the same ki,j value is applied to the
connections within each rigid body component. (b) The RMS
difference between the end conformation generated in this way,
{x }, and the targeted conformation,
{ i}, is only 2.5 Å. The windowed RMS
plots consecutively capture 70 residues per window. The solid line
stands for the windowed RMS value of {xi}
relative to { i}, while the solid-dot line
indicates the windowed RMS value between
{x } and
{ i}. Our distance interpolation method
with limited secondary conformational data captures the global behavior
of lactoferrin's conformational transition well.
|
|
This allows any strain between intermediate and initial conformations
to relax unless it results in steric clashes. The RMS difference
between the end conformation generated in this way with the targeted
crystallographic data is small, as shown in Fig. 10 b,
indicating that the protein's cooperativity is captured well using our
incremental distance interpolation method, and that much can be
inferred about global protein motions from a single complete set of
crystallographic data and limited amounts of partial data that describe
a second conformation. Hence, our framework may be used as a
visualization tool for experimentalists to superimpose partial
conformational data onto crystal structures to examine the
structural/kinematic implications of measured inter-residue distances.
 |
CONCLUSIONS |
We have developed a computationally efficient method for the
realistic simulation of proteins exhibiting transitions between two
crystallized conformations. Our method is also flexible and general
enough to incorporate partially observed, time-dependent conformational
data from experiments. It is based on a coarse-grained elastic network
model. Using cutoffs in the number of nearest neighbors generates a
linking matrix that is both sparse and uniformly dense, hence
permitting efficient computations. Our distance interpolation method is
the fastest method available for generating conformational transitions
while still preserving steric constraints. This is because it involves
only one inversion of a very sparse 3n × 3n matrix for
each frame in the animation, where n is the number of amino
acid residues. Unlike dynamics-based methods in which the size of the
timestep used is limited by the stiffest part of the structure, our
method is purely geometric, and so the number of required animation
frames is dictated only by the difference in shape between the two
conformations. Typically, we choose 99 intermediate frames, whereas
dynamics-based approaches must use several orders of magnitude more
timesteps to achieve conformational transitions. Our method also has
the benefit that it is not sensitive to parameter choices, solvation,
or quantum mechanical effects, which is not the case with the most
physically detailed models. While physical reality is the ultimate goal
of all computational modeling methods, we believe that having a method
effective for proteins of several hundred residues with an accessible
PC in a few hours may be preferable to more "realistic" techniques
requiring months of high-performance computer time, and are not
guaranteed to converge due to round-off errors, instability of
numerical integration, or even a lack of full knowledge about the true
nature of the chemical potentials involved in proteins.
Simulation results illustrate that the distance interpolation method
presented here reliably generates sequences of feasible intermediate
conformations of proteins without steric clashes. Animations produced
using this method are posted at
http://custer.me.jhu.edu/proteins/movies.html. The distance
interpolation method represents an improvement over simplified linear
position interpolations in terms of the realism of intermediate forms,
and over all-atom computational methods such as MD and NMA, in terms of
computational efficiency.
Address reprint requests to Gregory S. Chirikjian, Dept. of Mechanical
Engineering, The Johns Hopkins University, Baltimore, MD 21218. E-mail:
gregc{at}jhu.edu.
R. L. Jernigan's present address is Laurence H. Baker Center for
Bioinformatics and Biological Statistics, Iowa State University, Ames,
IA 50011-3020.