Models in computational biology, such as those used in
binding, docking, and folding, are often empirical and have adjustable parameters. Because few of these models are yet fully predictive, the
problem may be nonoptimal choices of parameters. We describe an
algorithm called ENPOP (energy function parameter optimization) that
improves
and sometimes optimizes
the parameters for any given model
and for any given search strategy that identifies the stable state of
that model. ENPOP iteratively adjusts the parameters simultaneously to
move the model global minimum energy conformation for each of
m different molecules as close as possible to the true
native conformations, based on some appropriate measure of structural
error. A proof of principle is given for two very different test
problems. The first involves three different two-dimensional model
protein molecules having 12 to 37 monomers and four parameters in
common. The parameters converge to the values used to design the model
native structures. The second problem involves nine bumpy landscapes,
each having between 4 and 12 degrees of freedom. For the three
adjustable parameters, the globally optimal values are known in
advance. ENPOP converges quickly to the correct parameter set.
 |
INTRODUCTION |
There are many models in computational chemistry,
biology, and materials science, in which parameterized energy functions are used to predict three-dimensional structures of molecules. Folding,
threading, docking, protein-protein recognition, and loop refinement
methods are examples from computational biology. It is seldom clear
whether failures of such models are attributable to the form of the
model's mathematical functions, or to poor choices of the parameters
used in them. Such models are usually so computationally expensive that
it is impossible to be systematic about finding the "optimal"
parameters, i.e., those parameters that minimize some measure of total
structural error.
To find the optimal parameters, say for a folding algorithm, it would
be necessary to 1) compute the folded structures for many proteins, 2)
determine the errors, 3) change the parameters, and then iterate this
whole process for many different sets of parameters to find the best
ones. This has not been computationally feasible before. Instead, model
parameters are usually chosen to varying degrees by physical estimates,
guesswork, and arbitrary trial and error. Such efforts involve small
searches through large parameter spaces. As with other types of search
problems, parameter optimization can depend on the order in which the
parameters are chosen, and it can get caught in traps from which it
cannot escape.
There are methods for optimizing parameters in certain classes of
problems (Esposito and Floudas, 1998
; Maiorov and
Crippen, 1994
; Koretke et al., 1998
; Hao
and Scheraga, 1996
). Two examples are threading
(Goldstein et al., 1992
; Hendlich et al.,
1990
; Maiorov and Crippen, 1994
; Thomas
and Dill, 1996
; Mirny and Shakhnovich, 1996
;
Huber and Torda, 1998
; Koretke et al.,
1996
) and lattice models of folding (Hao and Scheraga,
1996
; Goldstein et al., 1992
; Shrivastava
et al., 1995
; Govindarajan and Goldstein, 1995
).
In both cases, the ability to find optimal parameters is a direct consequence of the facts that 1) the conformational space is discrete, and 2) the global optimum is guaranteed to be among the conformations searched. Whenever global optima can be found through finite searches, there are methods that can be used to learn parameters that can distinguish correct from incorrect structures. This is arguably the
principal advantage of threading versus folding algorithms: the former
involve discrete searches, so parameters can be improved systematically.
As far as we know, no algorithm yet exists that can find optimal
parameters for models having continuous degrees of freedom. Consider
protein folding. To improve the parameters in a folding model, you need
to recompute the lowest energy structure many times, once after each
iteration of small changes in parameters. This means many
minimizations. The main problem in optimizing parameters for continuum
models is that the typical minimization methods
Monte Carlo, simulated
annealing, or molecular dynamics
find only local minima and fall into
different energy wells each time, so there is no unique and
reproducible mapping from a given set of model parameters to a given
model native structure. This lack of reproducibility is the primary
reason that parameter optimizations are difficult in continuum models.
We describe here a computational method, called ENPOP (energy parameter
optimization), that searches for globally optimal parameters for
continuous models. It takes as input 1) a given model and its
associated adjustable parameters, 2) the (usually incorrect) best
structure for each of m different molecules that is
predicted by the starting parameters, 3) the m correct
(true) structures that the model should produce, and 4) some measure of
structural error between predicted structures and true, known, or
correct best structures. The technology that enables us to circumvent
the reproducibility limitation and to systematically optimize
parameters is the recently developed CGU (convex global underestimator)
method that finds global minima
or at least reproducible minima
of
energy landscapes, at least for short enough chains (Dill et
al., 1997a
,b
).
Our approach is general and guarantees improved, and sometimes even
globally optimal, parameters. It should be applicable to a wide range
of models in computational biology and chemistry. It allows any
differentiable measure of "structural error" to be minimized over
the continuous space of both potential function parameters and
molecular conformations, while placing no restrictions on the form of
the potential function other than that it be differentiable. Our method
simultaneously tracks the global minimum for each of the proteins in
the model as the parameters are changed, while providing a guaranteed
reduction in the structural error between model-native and true-native
states. For this we do not require any additional global minimization
(beyond that needed to get the initial global minima), so the method is
computationally efficient. Our strategy then is to learn a set of
parameters for one set of proteins with known structures and apply
those parameters to the prediction of other structures. Our hope is
that by finding better parameters for any given model, this method will
ultimately allow computational biologists to develop improved models
for folding, binding, docking, etc.
 |
ENERGY FUNCTION IMPROVEMENT BY PARAMETER OPTIMIZATION |
We consider some model for an energy landscape,
F(
,
). F(
,
) is the conformational energy or
free energy as a function of the n degrees of freedom
(coordinates or bond angles, etc.) that are given by the vector
n and as a function of the k
parameters of the model that are given by the parameter vector
k. The parameters might include Lennard-Jones
parameters, steric terms, hydrogen bond or hydrophobic interaction
strengths, coefficients of bond angle energies, etc. There is no
limitation on the functional forms of the terms. Although the method is
general, we will make the discussion more concrete by focusing on
protein folding. The predicted native state is given by the global
minimum vector
G, which has a free energy
F(
G). If the model energy function were perfect, it would predict the true native structure; that is, we would
have
G =
N, where
N
is the correct dihedral angle vector for the native state of this protein.
We will consider the prediction accuracy in terms of the error in the
degrees of freedom 
G
N
2 for each protein. Other measures, such
as RMS, could be used with minor modification. To make explicit the
dependence of F on the parameters, we let
be a vector of
k parameters of the energy function that are common for all
proteins. In the energy functions we test here, we typically have
k
15.
Suppose we wish to improve the native structure predictions for a set
of m proteins, with energy functions
F(j)(
(j),
) and native states
N(j), j = 1, ... ,
m. Note that each potential function F(j)
and set of nj independent variables
(j) will be different and will depend on the number and
sequence of beads in the jth protein. But the vector
is
independent of j and will be the same for all proteins. For
any fixed value of
, we can use the CGU algorithm to find the
corresponding global minimum
G(j)(
) for each
protein. Each such global minimum is characterized by the
conditions
|
(1)
|
|
(2)
|
|
(3)
|
where H(j) is the
(nj × nj) Hessian
matrix with respect to
of F(j)
(
(j),
). While unlikely, it is possible that for
some initial choice
I, some Hessian
H(j) may only be positive semidefinite at the
corresponding global minimum. If this occurs, a different value of
I should be chosen. The total conformational error is
defined by
|
(4)
|
where wj
0 are any arbitrary
weighting factors that the user wants to include, and
j(
) is the jth molecule conformational error, given by
|
(5)
|
The weights wj can be selected to give more
or less importance to certain proteins, for example, based on known
accuracies or reliabilities of their structures, but otherwise might
normally be set equal to one. The parameter adjustment method proposed here can now be stated as a k-dimensional minimization
problem, with simple bounds on the parameters
|
(6)
|
The bounds (
min)j and
(
max)j should be chosen to appropriately
restrict the range on the parameter
j to values that are meaningful to the problem.
The key computational expense in this method is computing the function
(
) and its gradient 

. So we use an efficient minimization method that requires relatively few function and gradient
calls, the Sequential Quadratic Programming (SQP) method (Gill
et al., 1986
).
We need to track the changing value of
G(j)(
) as
is adjusted. We impose the constraint that
G(j)(
) must remain a local minimum of
F(j) (
(j),
) as
is
modified. This is done by requiring that the system of
nj nonlinear equations in Eq. 2 remains
satisfied as
changes, starting with an initially chosen parameter
vector
I. Note that this does not guarantee that the
updated value of
G(j)(
) will also remain the
global minimum of
F(j)(
(j),
), although this is
easily checked at the conclusion of the method by another CGU global
optimization step.
G(j)(
) can be updated without recomputing the
global minimum, by using the implicit function theorem, provided that
the Hessians H(j) are positive definite. This
condition is satisfied at
G(j) by Eq. 3. The
implicit function theorem uses a linearized approximation to Eq. 2 at
its current values of
and
and therefore gives the required
small change 
to balance any small change 
. It is valid if
the neglected terms in 


2 and



2 are much smaller than the first-order term in



and 


. For a small change

G(j), the corresponding change 
required to
keep Eq. 2 satisfied is given approximately by
|
(7)
|
where J
(j) = J
(j)(
G(j),
) are
the nj × k Jacobians of

F(j)(
G(j),
).
Because the Hessians H(j) = H(j)(
G(j),
) are
nonsingular, we can write this as
|
(8)
|
Because 

j(
) =
[J
(j)]T[H(j)]
1

j(
),
it then follows that the desired gradient 

(
) of
the total conformation error as a function of parameters is given by
|
(9)
|
Corresponding to each
G(j), good approximations
to the Hessians H(j) are obtained directly from
the SQP implementation of the local minimization, a key phase of the
CGU method. Also, the Jacobians J
(j) are
relatively easy to compute because J
(j)
has only k columns.
 |
THE ENPOP ALGORITHM |
We call our method ENPOP (energy function parameter optimization).
Here is the general algorithm:
1. Guess an initial
I.
2. Run the CGU method successively on the functions
F(j), j = 1, ... ,
m with
=
I fixed. Denote the result
of each CGU run by
G(j).
3. Using each
G(j) from step 2, perform the
minimization over
described in Eq. 6. To perform this step, use
I as a starting point and use Eq. 9 for the gradient of
.
To compute the change 
G(j) corresponding to
altered parameters (at each step of the local minimization) 
, we
use the implicit function theorem, which states that (see Eq. 7)

G(j) =
[H(j)]
1J
(j)
.
This step will result in a new parameter set
new with
corresponding global minima
G(j)(
new), j = 1, ... , m.
The ENPOP algorithm is quite general and can be applied to any set of
molecular structures with a given model and potential function. All
that is required is a suitable representation for the Hessian
H(j) and Jacobian
J
(j) specific to the model.
 |
GUARANTEEING REDUCTION IN STRUCTURAL ERROR |
The method described above optimizes parameters while enforcing
the requirement that the initial global minimum remain at least a local
minimum of the model energy function. But what guarantees that the
original global minimum will not shift to become just a local minimum?
In this section, we describe criteria that ensure that the original
minimum remains global. The purpose of such a criterion is
computational efficiency: when the global minimum is no longer ensured,
the CGU can be run again to check whether the current minimum is
global. The goal is to run the CGU only the minimum possible number of
times, to save computational cost.
When attempting to improve the parameter values for the set of
potential functions
F(j)(
G(j)(
),
), for
j = 1, ... , m, it is useful to have
prior information about the amount of reduction in the error
(
)
that can be expected. We now show that a lower bound on the decrease in
(
) can be given, based on information available at the initial
value
=
I. That is, we show that we can always
obtain a parameter vector
1 such that
|
(10)
|
where 
is given by Eq. 14. Because 
is a guaranteed
lower bound, the actual decrease in
(
) obtained by ENPOP may
typically be much greater.
To determine 
, we need to know several quantities that are
available once the global minima
G(j) have been
computed for each protein by the CGU method. For this purpose, we
define the gradient in parameter space, g = 

(
I), as given by Eq. 9, and the
initial energy gaps
F(j) for
j = 1, ... , m and
=
I, by
|
(11)
|
In Eq. 11, FG(j)
F(j)(
G(j)(
I),
I)
represents the global minimum energy found by the CGU method, using the
parameter vector
I, and
FLM(j) is the corresponding next lowest
local minimum energy, both for the jth protein. If the
gradient
g
= 0, then
I is already a stationary point of
(
), and a new value of
I must
be selected. Similarly, if
F(j) = 0, for
any j, then any change in
may cause the coordinates of
the global minimum for F(j)(
(
),
) to
move discontinuously from its current conformation
G(j) to the conformation corresponding to the
alternate global minimum (because FLM(j) = FG(j)). In this case, the jth
protein should be replaced (or removed) in the test set. Therefore, we
can assume that Eq. 11 holds for all j = 1, ... ,
m, and that
g
> 0.
In addition, we will require the quantities
Fmin
minj
F(j) > 0, the Hessian
H
(
) of
(
), and the curvature of
in the direction of g as given by
|
(12)
|
The value of
is bounded by
min
max, where
min and
max
are the minimum and maximum eigenvalues of
H
(
I). Because
H
(
) may be indefinite at
I,
may be negative.
Finally, we need a uniform bound on the gradient of
F(
(
),
) with respect to
:
|
(13)
|
In terms of these quantities, it can be shown that
|
(14)
|
In the first two bounds in Eq. 14, the change in
is limited by
the possibility that the current global minimum is replaced by one of
the other local minima (it is assumed that a new global minimum will
arise only from existing local minima). In the third bound, the
decrease in
(
) is at least that obtained by a single steepest
descent step in the direction
g in
-space.
We now test ENPOP on two very different problems. The first test
involves three short protein molecules, in two-dimensional conformational space, using a simple energy function with four parameters. ENPOP reduces the total conformation error from its initial
value
(
I) = 1012.3 to its minimum
(
N) = 0.87. The second test problem is one for
which we know in advance the optimum parameter vector
N,
such that
G(
N) =
N.
We do not, of course, use this knowledge in ENPOP, but we can verify
that ENPOP computes the correct parameter vector
N. This
test problem consists of nine different bumpy energy landscapes, each
with a different number of degrees of freedom (ranging from 4 to 12),
but with a common set of three parameters. We find that ENPOP always
finds the optimum parameter vector
N, starting
with different initial vectors
I.
 |
TEST PROBLEM 1: 2D PROTEIN FOLDING MODEL |
We first consider a problem involving three short chain molecules,
in two-dimensional conformation space, using a simple energy function.
The energy function consists of four terms, a bond length penalty term,
a Lennard-Jones (LJ) attraction/repulsion term, a hydrophobic
attraction term, and a hydrogen bond term. The energy function contains
four adjustable parameters. The test molecules have from 12 to 37 beads
and differ from each other in the specification of which beads are
hydrophobic and which pairs are hydrogen bonded. The native state
conformations are created in advance by arbitrarily choosing bond
angles. The four parameters are adjusted by ENPOP so that the global
minimum of the energy function, for each molecule, gives a
corresponding conformation as close as possible to its known native conformation.
To simplify both the calculation and the discussion, the degrees of
freedom are the (x, y) coordinates of each bead,
(xi, yi), i = 1, ... , n, n = 12, 25, 37. For each molecule, bead 1 is fixed at the origin. We let X denote the vector with
2n elements
(xi, yi), i = 1, ... , n. The conformation of a molecule is then specified
by the vector X. Because the distance rij between beads i and j
is given by rij2 = (xi
xj)2 + (yi
yj)2, the vector X
completely specifies all pairwise distances between beads. We also let
4 and denote the parameter vector
specifying the four parameter values as
i,
i = 1, ... , 4.
The energy function is
|
(15)
|
where
i > 0 and
|
(16)
|
H denotes the set of hydrophobic beads, and
HB denotes the set of hydrogen bonded pairs. Note that the
minimum in the LJ term occurs at rmin = (0.1
2)1/6, where it has the value
(
10
1)/
2.
For each test molecule the sets of hydrophobic and hydrogen-bonded
beads are chosen to be different, while the parameter values are the
same for all molecules. There is one energy function for all proteins,
but folds are different because proteins have different monomer
sequences. We represent the energy function for the jth test
molecule by Fj(Xj,
),
j = 1, 2, 3. For each test molecule we have a known
native state conformation XNj, j = 1, 2, 3.
Molecule 1 consists of 12 beads, with four of them hydrophobic:
H = (1, 2, 11, 12); and there are three hydrogen-bonded
pairs: HB = (1, 4), (5, 8), (9, 11). Molecule 2 consists of 25 beads, with seven of them hydrophobic: H = (3, 6, 7, 15, 21, 24, 25); and four hydrogen-bonded pairs:
HB = (3, 6), (9, 12), (15, 18), (19, 22). Finally,
molecule 3 consists of 37 beads, with 12 of them hydrophobic:
H = (1, 5, 8, 9, 12, 17, 20, 21, 24, 29, 32, 33); and
seven hydrogen-bonded pairs: HB = (1, 4), (3, 6),
(7, 10), (17, 20), (19, 22), (23, 31), (34, 36).
We start with an arbitrary initial choice
I of the
parameter vector. Corresponding to
I are global minimum
energy conformations XIj, obtained by minimizing
Fj(Xj,
I),
j = 1, 2, 3. These initial and native conformations are
shown in Fig. 1, as the first and last of
the four conformations for each molecule. Let
Xj(
) denote the minimum energy conformation
for any
. We have Xj(
I) = XIj. We aim to find
N, so that the conformation error
(
) is minimized, where
|
(17)
|
To illustrate the process of parameter optimization, we explore
trajectories in parameter space, from the initial to final parameter
vectors. We follow the steepest descent path in parameter space to find
the native parameter vector
N, such that
(
N) is a minimum. At each descent step 
we
reduce
(
) by a small amount. The corresponding changes in the
function values Fj and corresponding
conformations Xj(
+ 
) are then
obtained by local minimizations of
Fj(Xj,
+ 
), starting with Xj(
). The SQP code
NPSOL (Gill et al., 1986
) is used for this local
minimization, and because the conformation change due to 
is
small, this computation is fast. A typical steepest descent calculation
in parameter space requires several hundred steps and takes 10-30 min
on a current workstation. Note that the more efficient SQP minimization
method could also have been used in parameter space, but it would have
given much less information about the nature of the contours on the
(
) surface.

View larger version (27K):
[in this window]
[in a new window]
|
FIGURE 1
Three two-dimensional model proteins. The three
structures on the right are the true natives, which are the global
minimum conformations for the native parameter vector N.
The three structures on the left are the global minima for an incorrect
initial parameter vector I. ENPOP finds the
correct parameters N through an iterative procedure
starting with I.
|
|
One steepest descent calculation is shown in Figs. 1 and
2. For this example, the conformation
changes for each of the three molecules as
changes from
I to
N are shown in Fig. 1. For each of
the three test molecules, the initial conformation (given by
Xj(
I), j = 1, 2, 3) is shown on the left of the figure. The conformations on the
right of the figure show Xj(
N), j = 1, 2, 3. The two intermediate conformations, for
each molecule, show the Xj(
) for two
intermediate values of
along the path in parameter space, as
(
) was minimized. The red beads are hydrophobic, and hydrogen
bonding is between selected pairs of beads, as stated above. The
initial values of the parameters were
I = (1.0, 0.5, 0.1, 50). The parameter values that minimized
(
) were
N = (1.7, 0.85, 0.73, 35). The initial value of the conformation error was
(
I) = 1012.3, and its
final value was
(
N) = 0.87. The difference
between the desired native conformations XNj and
the final computed conformations
Xj(
N) are so small that they
cannot be distinguished in Fig. 1.

View larger version (16K):
[in this window]
[in a new window]
|
FIGURE 2
Changes in the parameter values as ENPOP steps from
I to N, through a steepest descent path
in parameter space.
|
|
The conformations in Fig. 1 show that initially the hydrogen bond term
dominates relative to the hydrophobic term, but that in the native
state the hydrophobic term essentially determines the structure. This
is because
3 (hydrophobic) increases from 0.1 to 0.73, while
4 (hydrogen bond) decreases from 50 to 35. The
manner in which the four parameters changed along the steepest descent
path is shown in Fig. 2, which shows the trajectory in parameter space
in moving from
I to
N, along the steepest
descent path for
(
). Because we are following a steepest descent
path in parameter space, the value of
(
) decreases monotonically as
is changed from
I to
N. However,
as shown in Fig. 2, the values of the four parameters,
i, i = 1, 2, 3, 4, do not change monotonically as a function of the path length in parameter space. Thus, for example,
1 and
3 are initially
decreasing, even though they eventually increase along the steepest
descent path. This shows that while minimizing
(
) in parameter
space is a nontrivial problem, it can be successfully accomplished, as
illustrated by this example.
 |
MODEL PROBLEM 2: BUMPY LANDSCAPES WITH KNOWN OPTIMAL PARAMETERS |
For the second test problem, we sought a set of bumpy energy
landscapes for which we can know the optimal parameter set in advance.
The degrees of freedom are
n, and the
energy function is F(
,
), which depends in a smooth way on
and on the parameter vector
k.
In this case we chose the function
|
(18)
|
where
=
(
,
) = 1/2(
A
)T D(
A
), µ > 0 and w > 0 are constants, A is an
n × k matrix of rank k, and D is
an n × n positive diagonal matrix. For any fixed value
of the parameter
, F(
,
) has many
local minima but attains its unique global minimum at
=
G(
) = A
, with the
value
|
(19)
|
Thus, the solution
G(
) to
|
(20)
|
for any fixed
is known a priori. The dependence of
F(
,
) on
is illustrated in Fig.
3 for k = n = 2, µ = 20, w = 5, and
|
(21)
|
F(
,
) has a large number of local minima, but
the global minimum is at
|
(22)
|
This form of the potential function is characterized by a rugged
energy landscape with numerous kinetic traps, energy barriers, and
narrow pathways to the native state. Although this potential function
is artificial, it shares many of the characteristics of real protein
folding energy landscapes (Bryngelson and Wolynes, 1987
,
1989
; Chen
and Dill, 1998
; Dill and Chan, 1997
;
Leopold et al., 1992
).
To extend this test problem to the more general case of m
different molecules with native states
N(j),
j = 1, ... , m, the energy function
F(j) for the jth such molecule will
be given by Eq. 18 with D = D(j) and
A = A(j), j = 1, ... ,
m. If the A(j) are different, we can
usually only make
G(j) =
N(j) for
at most one landscape, where
G(j) is the global
minimum vector
of Eq. 20 for the jth landscape. In this
case, we set the weights wj given in Eq. 4 to
wj = 1 for j = 1, ... ,
m. Then the general problem of determining the vector
that
minimizes the average angular error
(
), over all m
landscapes, previously defined by Eqs. 4-6, simply reduces to
|
(23)
|
For the specific energy function in Eq. 18, we know that
|
(24)
|
Therefore, Eq. 23 is the simple least-squares problem for
given by
|
(25)
|
Its solution
* is given by the solution to the system of linear
equations
|
(26)
|
Furthermore, the gradient of F with respect to
is
given by
|
(27)
|
This gradient is zero at
= A
and at all
points where 1 + µw sin(w
) = 0. If
the Hessian of F is positive at such a point, it is a local
minimum of F.
Because of the special form of F(
,
), as given by Eq. 18, we know in advance the Hessian and Jacobian matrices that are
needed to compute the gradient 

(
). As a result,
we were able to use an efficient SQP local minimization directly in the
parameter space to minimize
(
), as given by Eq. 23.
Test problem 2 consists of nine different landscapes, each with a
different number of degrees of freedom (ranging from 4 to 12).
Therefore, the vector
(j), j = 1, ... ,
9, for each molecule has a different dimensionality, with
(j)
nj,
nj = j + 1, and
j = 1, ... , 9. The value of k = 3
is chosen so that there are three parameters to be determined. For each
molecule, an (nj × 3) matrix
A(j) was generated with linearly independent
columns. The native state, denoted by
N(j), for each
molecule was obtained by choosing a value
and then computing
(j) = A(j)
,
j = 1, ... , 9. Each
(j) was then
randomly perturbed to give a native state
N(j). This
was done so that no value of
exists for which
G(j)(
) =
N(j); that is,
(
) in Eq. 23 cannot be zero. From the solution to Eq. 26, we know
a priori the optimal value of
* and its corresponding minimum error
(
*) = 2.19. This is necessary for validating that the method
can find globally optimal parameters.
A randomly chosen
I
3 was used
as the starting value for the parameters. This gives the initial error
(
I) = 58.76. Corresponding to this initial
choice for
, each landscape has an initial global minimum
conformation
G(j)(
I), computed by the
CGU global minimization algorithm. This value is, of course, also given
directly by A(j)
I, but we did not
use this information for our tests. The ENPOP algorithm was then
applied to improve
so as to simultaneously bring all of the global
minima
G(j)(
) as close as possible to their
corresponding native conformations
N(j). ENPOP
required five iterations to reduce
from
(
I) = 58.76 to its minimum value
(
*) = 2.19. Each iteration
gives a reduced value of
and corresponding global minimum
conformations
G(j)(
). The corresponding values of
(
) obtained by the algorithm at each iteration were 58.76, 11.47, 9.36, 3.56, 2.78, and 2.19. The final value
(
*) = 2.19 is
the known minimum possible value of
(
) for this test problem.
 |
CONCLUSIONS |
We have described a method, called ENPOP, for optimizing the
parameters in models used in computational biology, such as in folding
and docking, where there is a unique structure at a global minimum of
energy. ENPOP iteratively refines the parameters by enforcing the
requirement that the energy minimum that represents the best predicted
structures move systematically closer to the true native structures. We
validate the method on two very different test problems. One is a
two-dimensional short-chain protein folding problem. The other involves
bumpy energy landscapes for which the optimal parameters are known in
advance, to check that the method can identify the unique globally
optimal parameters. While these test problems are relatively simple,
they show that the ENPOP method is computationally efficient and can
improve and optimize the parameters in models of the type that are
commonly used in computational biology.
The authors thank the San Diego Supercomputer Center for
computational resources and the National Institutes of Health (grant GM34993) and the National Science Foundation (grant DBI-9996165) for
support for this research.
Address reprint requests to Dr. Ken A. Dill, Department of
Pharmaceutical Chemistry, University of California at San Francisco,
Suite 415, Laurel Heights Campus, 3333 California St., San Francisco,
CA 94118. Tel.: 415-476-9964; Fax: 415-476-1508; E-mail:
dill{at}maxwell.ucsf.edu.