Biophysical Journal 87:1567-1577 (2004)
© 2004 The Biophysical Society
LMProt: An Efficient Algorithm for Monte Carlo Sampling of Protein Conformational Space
Roosevelt Alves da Silva *,
Léo Degrève * and
Antonio Caliri
* Departamento de Química, Faculdade de Filosofia Ciências e Letras de Ribeirão Preto, and
Departamento de Física e Química, Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, 14040903 Ribeirão Preto, São Paulo, Brazil
Correspondence: Address reprint requests to Roosevelt Alves da Silva, E-mail: roos{at}obelix.ffclrp.usp.br.
 |
ABSTRACT
|
|---|
A new and efficient Monte Carlo algorithm for sampling protein configurations in the continuous space is presented; the efficiency of this algorithm, named Local Moves for Proteins (LMProt), was compared to other alternative algorithms. For this purpose, we used an intrachain interaction energy function that is proportional to the root mean square deviation (rmsd) with respect to
-carbons from native structures of real proteins. For phantom chains, the LMProt method is
104 and 20 times faster than the algorithms Thrashing (no local moves) and Sevenfold Way (local moves), respectively. Additionally, the LMProt was tested for real chains (excluded-volume all-atoms model); proteins 5NLL (138 residues) and 1BFF (129 residues) were used to determine the folding success
as a function of the number
of residues involved in the chain movements, and as a function of the maximum amplitude of atomic displacement
rmax. Our results indicate that multiple local moves associated with relative chain flexibility, controlled by appropriate adjustments for
and
rmax, are essential for configurational search efficiency.
 |
INTRODUCTION
|
|---|
When the investigation of protein-solvent systems requires an exhaustive search on the configuration space, the MC method is considered the main available simulation tool. Particularly, there are different MC techniques to simulate thermodynamic and kinetic aspects of many distinct systems. In general, these techniques may be divided into two main classes, namely, dynamic and nondynamic MC methods. The first class is characterized by a set of standard moves that resemble the specific system dynamics at the microscopic scale (Binder and Baumgartner, 1992
; Degrève and Caliri, 1995
). It is important to observe that the term "dynamic MC" is currently used in reference to the emulation of real molecular moves, it may (and should not) be confused with another method also named "dynamic Monte Carlo method", which simulates the real-time evolution of physical systems (Aièllo and da Silva, 2003
). On the other hand, the main concern of the nondynamic MC method is to adequately explore the phase space without worrying about the sequence of configurations in the time. Therefore, once the ergodic hypothesis is satisfied and the absence of bias is guaranteed, this method is exclusively used for dealing with thermodynamic amounts. However, for chain systems, the main general difficulty in using the nondynamic MC method is exactly at this point: to ensure a good statistically representative sampling (configurations without spurious bias) for ensemble averaging. Then, one may suggest that a practical test for a specific sampling technique is to use small systems for which it should be possible to exhaustively investigate the configurational space, and then compare the complete ensemble averages to the simulated averages. Indeed, this approach constitutes a basic criterion for the quality of an MC simulation (Zhou and Berne, 1997
; Berne and Straub, 1997
), but for problems like protein folding in which more than just equilibrium states are necessary, its use is still a challenge: the existence of a high number of metastable states (local minima) promotes energetic traps that may reduce the efficiency and the accuracy of nondynamic MC methods.
In this work, we present an MC algorithm that increases the efficiency of the configurational search through a simple method capable of producing multiple local moves in flexible peptide chains. The method is versatile, allowing the control of the magnitude of moves size (average displacement of atoms for movement) and of the number of involved residues in the local movement, improving the performance of the configurational search and is expansible for use during different stages of the same run.
 |
MONTE CARLO MOVES FOR PROTEINS
|
|---|
A traditional way to generate protein chain configurations is to promote independent small perturbations on the dihedral angles of the main chain (Lal, 1969
; Madras and Sokal, 1988
). This type of movement, called pivot or Thrashing algorithms, has been criticized for producing very low acceptance rates for the new configurations (Zhou and Berne, 1997
; Favrin et al., 2001
; Shimada et al., 2001
). In real proteins, collective moves that result from simultaneous movements of sequential groups on the chain are among the main factors responsible for the escape from energy and topological barriers. So, chain movements that include such simultaneous small energetic changes along several sequential chain groups may become crucial to improve the method efficiency. In general, algorithms that do not present such properties include complementary artifacts to be successful (Berg and Neuhaus, 1991
; Hansmann and Okamoto, 1993
). For example, the multicanonic algorithm (Moret et al., 2002
) and simulated tempering (Lyubartsev et al., 1992
; Marinari and Parisi, 1992
) can demand numbers of nonfeasible calculations depending on the system. Thus, the application of single pivot-type movements to generate protein configurations (or other macromolecules) has been considered inadequate for a complete and efficient MC simulation, because, as it has been pointed out (Cahill et al., 2002
), simulation algorithms must prevent the breaking of kinematics principles, such as the overlapping and transposition of chain groups. The observance of this principle becomes even more critical when protein chains are absorbed in solvent, where drastic configuration changes are frequently forbidden by additional restrictions imposed by the solvent molecules (Shimada et al., 2001
).
Therefore, to guarantee an appreciable acceptance rate, it is usual to change the configurations by considering only a few sequential chain elements at a time. This type of local chain deformation was first analyzed by Go and Sheraga (1970)
and later, a new algorithm, named "concerted rotation", was introduced by Dodd et al. (1993)
following the same reasoning. In this algorithm, the movements consist of combined rotations around seven adjacent skeletal bonds, leaving the rest of the chain unaffected. For the accomplishment of such moves, a temporary change of variables is necessary, and so it is required that the Jacobians of such transformations be calculated for both the original and the new attempted configurations, to ponder the transition probabilities adequately and to satisfy the detailed balance of the system. The combination of these local movements with other movements has enabled one to describe the equilibrium states of polymer systems involving chains of different sizes (Siepmann and Frenkel, 1992
; de Pablo et al., 1992
).
Similar to what occurs in the concerted rotation method used for polymers, a simple way to prevent large configurational changes in proteins is to allow that a set of n adjacent dihedrals angles {i} be modified only by small amounts {
i}, in such a way that the movement of the chain becomes practically local. Actually, this type of movement involving two, four, or six combined rotations has been successfully applied in a thermodynamic and kinetic study of crambin's folding (Shimada et al., 2001
). The effective step size S, defined as
 | (1) |
was obtained from an angular Gaussian distribution of width
= 2°, producing mostly local moves but still with a poor acceptance rate of only 10% for protein compact states. With the application of a conformation-dependent Gaussian distribution, Favrin et al. (2001)
have increased S by a factor of 3 when compared to the Gaussian distribution used by Shimada et al. (2001)
, without affecting the local move character of the algorithm.
Recently, another alternative move set has been presented and it can be considered as local moves, (Cahill et al., 2002
, 2003
) by allowing only very small rotations. This is done in such a way that the perturbed atoms are moved through a distance no larger than 0.05 Å. The efficiency of the proposed algorithm was tested by using a function that is proportional to the global root mean square deviation (rmsd), as interactional energy,
 | (2) |
where Di is the distance between the
-carbons of residue i of the reference structure and of residue i of the correspondent simulated structure; the sum of i considers all chain's residues. The use of such nonphysical potential is important for the comparative determination of the efficiency of a proposed algorithm. This is done by isolating its capability in generating configurations that satisfy the basic topological constraints of the chain (for instance, the chain-excluded volume), which in turn avoids energetic traps generated by specific potentials for protein-like chains. For phantom chains (that is, chains with no physical volume), this approach can drive the chain to the native structure very quickly; however, its efficiency is strongly affected if the chain-excluded volume is introduced. Indeed, very small configurational jumps, determined by small angular change amounts {
i}, are not effective to promote the escape from local energy minima or topological traps, leading to a very inefficient search on phase space. Therefore, in this work, we introduce an alternative MC algorithm, named Local Move for Proteins (LMProt), capable of generating valid local configurational changes in flexible chains by means of larger configurational perturbations than the ones encountered in other methods (Eq. 1). This factor significantly increases the sampling efficiency of the conformational space. In this method, the chain degrees of freedom are represented by dihedral angles, bond lengths, and angles between bonds. Each configuration is produced by means of two simple well-known processes, which have still not been applied in this context. The first one consists of random modifications to positions of a set of atoms (nitrogen, N; carbon-
, CA; and carbons, C) consisting of
consecutive residues. In the second step, the modified atomic coordinates are corrected by means of Lagrange multipliers to preserve the chain basic geometry.
In the next sections, the LMProt algorithm is fully described and its efficiency and accuracy are compared against results obtained by other recently proposed algorithms involving local and nonlocal moves (Cahill et al., 2002
, 2003
). Two types of analysis were accomplished. First, the native structure of the protein 16PK was used as a reference and initially the effect of excluded volume was not considered to enable direct comparison with results of other algorithms. Specifically, the same global interactional energy considered by Cahill et al. (2002)
was also used here. In the second analysis, hard sphere-type potential was added to consider the chain-excluded volume effect (model with all atoms). Two proteins were employed as references in this case, namely the 5NLL and 1BFF; all atomic degrees of freedom were considered, including those of side-chain atoms. The folding success
in finding the native structures in a given time window tw and its corresponding folding speed
' were estimated as functions of the number of sequential residues
involved in the local movements of the chain and of the maximum displacement
rmax allowed for each atom.
 |
THE LMPROT ALGORITHM
|
|---|
Local configurations method
The LMProt algorithm is specially designed to produce local chain configurations in a flexible chain. It is considered as being "local configuration change" when a new configuration is obtained through the modification of all atomic coordinates of
consecutive residues without modifying the positions of the remaining residues (of course, preserving all restrictions imposed by the chain geometry) or breaking kinematics principles, such as the overlapping and transposition of chain groups.
If
residues of the chain are randomly chosen to generate a new configuration, the LMProt method combines 3
+ 2 new dihedral angles, including the angles around peptide bonds, and all pertinent bond lengths and bond angles (see scheme in Fig. 1). Starting from a given configuration, a new one is obtained by the LMProt as follows:
- A number a
b of consecutive residues are randomly selected.
- The coordinates {ri} of all atoms (N, CA, and C) of the
residues are changed to
through specific random moves {
ri}, each one satisfying
rmax <
ri
rmax, where
rmax is the maximum displacement allowed for each atom.
- The coordinates
are then recursively modified to
until all pairwise geometric constraints {dij}, which depend only on the set of fixed interatomic distances, are recovered. Two particular atoms i and j are assumed as correctly constrained when the distance dij between them (fixed by the chain geometrical constraints) is preserved, independent of the chain configuration, and remains inside a particular interval. In other words,
 | (3) |
where
lij is the maximum deviation allowed for the fixed geometrical distance lij between atoms i and j, which is the flexible chain condition. The {lij} values used are those from the force field GROMOS96 (van Gunsteren et al., 1996
), which are shown in the second column of Table 1 for atoms belonging to the main chain. For
modified residues, the condition imposed by Eq. 3 must be obeyed for all constraints {dij} involving all atoms (N, CA, and C) of the
residues (see Fig. 1, for example). There is a total of 7
+ 4 of such d-constraints involved in preserving the geometry of the main chain: 6
+ 3 are related to bond lengths and bond angles, and the remainder
+ 1 are responsible for preserving the amide plans of the main chain. For
= 3, the constraints involved in a local move are shown in Fig. 1.

View larger version (13K):
[in this window]
[in a new window]
|
FIGURE 1 Scheme showing the constraints involved in a local movement for the LMProt algorithm. In this example, three residues have their N, CA, and C perturbed atoms (hatched lines). Bond lengths and bond angles are preserved by means of type A and B constraints, respectively. Type C constraints are defined to limit the changes in the amide planes. A local configuration produced by the LMProt is equivalent to the perturbation of dihedral angles, bond lengths, and bond angles, simultaneously.
|
|
To recover the geometrical constraints of the chain, the LMProt algorithm uses the method of the Lagrange multipliers applied iteratively after any set of movements
Let bij be a vector defined by
 | (4) |
where
and rj are the positions of atoms i and j, which are supposedly being geometrically constrained. In the iterative process, the corresponding Lagrange multiplier
ij appropriately corrects the magnitude of
to satisfy Eq. 3. Therefore, it is defined by
 | (5) |
and the new position of i can be redefined for all atom coordinates through
 | (6) |
to satisfy Eq. 3. For all the modified atom positions, Eq. 6 is applied until Eq. 3 is satisfied for all constraints involved for each moved atom; a similar technique is also used by the algorithm SHAKE (Ryckaert et al., 1977
). A simplified scheme is shown in Fig. 2 to see how the LMProt generates a particular configuration from the solution of this system of linear equations. The positions of hydrogen (H) and ß-carbon atoms in the main chain are directly determined from the positions of atoms that have already been determined. The side chains are also moved by a similar process, as shown in Fig. 2, but in this case the convergence is very fast.
The proline residue is a special case; for its description an additional constraint is necessary to keep the distance between the carbon atoms Ci1 of the residue i 1 and its
atom (see Table 1). This constraint is equivalent to keeping the dihedral angle between Ni and CAi of the corresponding proline residue constant.
 |
GENERAL ASPECTS OF THE ALGORITHM
|
|---|
Protein 1BFF (129 residues) was firstly used to perform preliminary check simulations (CS) of the LMProt algorithm. A total of 120 independent simulations was considered, each one being composed of
106configurations generated according to the scheme shown in Fig. 2. This was done to analyze the general dependency of the code on its parameters. Half of them, that is 60 simulations, were performed for a tolerance level on the geometrical constraints of 0.5%. In other words,
li,j = 0.5li,j/100, for six different values of
, namely, 3
8. Similarly, the remainder of the cases were destined for simulations with a 1% tolerance level. In Table 1, the GROMOS prefixed main-chain geometrical constraints {li,j} used in this work are listed, as well as the corresponding standard deviation (SD) estimated along the CS for a 1% tolerance level (
li,j). For a 0.5% tolerance (not shown here), SD is correspondingly smaller.
The flexibility of the main chain is obtained by means of small variations on the angles between the peptide planes CAiCiNi and CiNi+1CAi+1, as a consequence of small changes allowed on the prefixed values of all geometrical distances {li,j} of the main-chain atoms. Indeed, tolerances
li,j of 0.5% and 1% on each constraints li,j produce variations up to ± 11° and ± 15° between peptide planes, respectively.
The CS also showed that for the LMProt, the average timing cost
for each new generated configuration depends on a combination of
and
li,j, the code efficiency on appropriately adjusting the geometrical constraints {li,j}, as summarized in Fig. 3 and Table 2. Fig. 3 shows the distribution of the number I of necessary adjusting iterations required to satisfy all the geometrical constraints {li,j} for the corresponding tolerance {
li,j}. To construct each curve, data from 107 configurations (10 complete runs) were employed. For both levels of tolerance, as increase in
displaces the curve peak to the left (smaller I) and makes its amplitude increase sensibly. Therefore, increasing
tends to reduce the timing cost
which is numerically confirmed in Table 2. Note that the code efficiency (Table 2) is also improved as
increases, because for
li,j at 0.5% (1%), the fraction of failure
in adjusting the constraints {
li,j} (number of attempts I
Imax = 400) changes from
= 22.2% (5.1%) for
= 3, to
= 0.00% (0.00%) for
= 8.
Finally, the distribution of the average effective move size (
) along the simulations is shown. Such distribution is determined by the atoms' coordinates ri and
of all successful moves, that is,
 | (7) |
where the internal sum runs over all m atoms {i}, and the external one runs over all N complete simulations considered. Fig. 4 shows how
-distribution depends on
rmax, as a function of
in the interval 3
8. The curves collapse at about
=
rmax, where the reversal of their up/down order is observed (as it is necessary for a distribution function). The peak of the
-distribution tends to occurs at
max =
rmax; however, mainly for
= 3 and
= 4, the number of possible geometric solutions for local moves is small, and so
max cannot correspond to
rmax. As shown in Fig. 4, for
= 3 and
= 4,
max is almost always lower than
rmax; higher
rmax increases this difference. For
5,
max;
rmax, except when
rmax = 2.75 Å, and possibly for greater values, because
max has to be restricted by the main-chain constraints and
values.
 |
RESULTS
|
|---|
Comparative efficiency tests for phantom chains
Initially, the sampling efficiency of the LMProt algorithm is compared against two recently developed algorithms for generating phantom chain configurations; the native structure of protein 16PK (Bernstein et al., 1998
) was used for this purpose (Cahill et al., 2002
, 2003
). In all cases, the same interactional potential energy based on the rmsd (as discussed above) was employed, and a new configuration was always accepted if its corresponding rmsd was smaller or equal to the rmsd of the previous one. This type of MC simulation where structural information of the study subject is employed in some way, without using physical energy potential, has been called "reverse MC method" (McGreevy and Pusztai, 1988
).
For this test, 10 independent simulations were carried out with the LMProt and Thrashing algorithms, and their respective evolution compared. All simulations started from an independent random chain presenting large rmsd, always larger than 20 Å. Each configuration generated with protein 16PK by the LMProt results from perturbations (
rmax = 1.0 Å) and corrections on all N, CA, and C atoms of
= 6 consecutive residues. For both algorithms, each MC step generates 2n np try configurations, where n is the total number of residues of the chain and np is the total number of proline residues (Cahill et al., 2003
). Thus, for the Thrashing algorithm, all dihedral angles of the main chain are perturbed in each MC step, resulting in a total of 2n np configurations. Each new configuration is produced after a rotation around one of its dihedral angles by a small value 
. For this work, we have used 
= 0.0125 Å, which is the same value that Cahill et al. used in a recent work (Cahill et al., 2003
).
As illustrated in Fig. 5 for phantom chains, the packing evolution differs significantly for each algorithm. Actually, after 50 MC steps, the LMProt method is able to reduce the rmsd from
20 to 0.8 Å, whereas this value is still
10 Å when Thrashing is employed. For a given level of accuracy, say
rmsd
3 Å, this is about three orders of magnitude faster then the Thrashing algorithm. Indeed, for phantom chains, LMProt shows a configurational updating rate that is
103 times faster than the Thrashing algorithm: it takes 104 MC steps to drive the chain to a conformation exhibiting rmsd = (3.2 ± 0.3) Å (average over six runs) in the latter case, whereas the same accuracy is reached by the LMProt algorithm only in
10 MC steps. On the other hand, after 104 MC steps under the LMProt algorithm, the chain reaches a conformation presenting
rmsd
(0.17±0.01) Å, that is, the native structure is virtually found. The performance of the Thrashing algorithm, however, cannot be improved by simply changing 
. As it has been shown by Cahill et al. (2003)
, by increasing 
from 0.0125 to 0.5 Å,
rmsd
is increased from 2.4 to 3.7 Å after 105 MC steps.
We also compared the LMProt algorithm against another method, namely the "Sevenfold Way algorithm" (Cahill et al., 2003
), in which the movements of the chain are also performed by local moves; the same protein 16PK and phantom chain were used here. The same simulation protocol used in the previous simulations of the LMProt and Thrashing algorithms was defined to correspond to the simulations already executed by the Sevenfold Way of Cahill et al. Thus, the same number of configurations produced for MC step in the cases of the LMProt and Thrashing algorithms is also generated by the Sevenfold Way algorithm. As shown by the inset in Fig. 5 (small table), LMProt reaches
rmsd
of
0.5 Å after 500 MC steps, whereas the Sevenfold Way algorithm leads to the same level of accuracy only after
104 steps MC; that is, LMProt was 20 times faster. If more accuracy is required, the relative performance of LMProt is still comparatively better: for
rmsd
= 0.4 Å,
3 x 103 and 105 MC steps were, respectively, necessary for the LMProt and Sevenfold Way algorithms. We also tested LMProt for another structure presenting mostly tertiary contacts, protein 1AF6 (Wang et al., 1997
), with 421 residues, and the results for this case, which also involved phantom chains, were similar to those discussed above for protein 16PK.
Effect of the excluded volume on the performance of the LMProt and Thrashing algorithms
In this section, a pairwise interaction potential, namely the hard sphere potential, is introduced to analyze the effect of the chain excluded volume on the performance of the LMProt and Thrashing algorithms. Two proteins were considered, namely protein 5NLL (Ludwig et al., 1997
) and 1BFF (Kastrup et al., 1997
), with 138 and 129 residues, respectively, to determine the efficiency dependency of the LMProt algorithm on the
and
rmax parameters. The two very distinct tertiary structures were selected to verify how the search of configurations could be affected by the dominant structural topology; protein 5NLL is mainly formed by
-helixes and 1BFF by ß-sheets. Now, with the chain-excluded volume considered, a new configuration is accepted only if no superposition of atoms is verified, that is, for each pair of atoms, say atom i and j, the distance rij between them must satisfy the condition rij
(Ri + Rj), being Ri and Rj the radii of the atomic spheres i and j representing the respective atoms. If this condition is satisfied, the corresponding pairwise potential energy
ij = 0; otherwise
ij
. Thus, the probability
of transition between two consecutive configurations x and y will be governed by
 | (8) |
where rmsdx and rmsdy are the root mean standard deviation of configurations x and y, respectively, with respect to the coordinates of
-carbons of the reference structure. The atomic radius Ri was fixed as Ri = 1.0 Å for all chain atoms (hydrogen atoms were not included).
A total of 10 independent simulations were performed for each pair
and
rmax chosen, all beginning with chains presenting rmsd always higher than 20 Å. As described above, each new configuration is generated by perturbing the positions of all atoms of
consecutive residues, chosen randomly. The values of
and
rmax are allowed to vary from 5 to 8 and from 0.25 to 2.75 Å, respectively.
Ideal values for
and
max are considered here as those that maximize the folding success
in reaching the native structure, and the folding speed
'. A particular simulation is considered successful if the chain reaches conformations presenting rmsd < 1.0 Å in the time window tw = 2.5 x 104 MC steps. Thus, the parameter
measures the ratio between the number of successful simulations and the total number of performed simulations, whereas, in turn, parameter
' measures "how fast" the folding process is in guiding the chain to conformations near the reference structure (native), by counting those successful runs that reached the native structure in a time
equal to 30% of tw (that is,
= 0.75 x 104 MC steps). The results of this analysis for both proteins are summarized in Fig. 6. Note that to emphasize the combinations of the parameters
and
rmax that result in larger values of
and
', the plot reference was fixed at
=
' = 0.85.
In general, for larger
rmax and for specific values of
(mostly for
6), the folding success reaches values >90%; especially for protein 5NLL. However, there are specific combinations of
rmax and
that also result in absolute success: particularly for
rmax = 1.25 Å, and
varying from 6 to 8, the folding success is absolute for both proteins 5NLL and 1BFF.
The dependence of the folding speed
'on the parameters
and
rmax is stronger, but in general the folding speed is favored by using larger
rmax as suggested in the graphs from Fig. 6. For both proteins, there is a specific combination
and
rmax that determines an absolute success (100% of the runs) in the time window
that is, absolute
and
'. For protein 5NLL, this combination is
rmax = 2.25 Å and
varying from 6 to 7; for protein 1BFF,
rmax = 1.75 Å, and
also varies in this same interval. We consider the ideal adjustment for
the one that appears more times for both proteins with
=
' = 1.0 and ideal
rmax are those where this ideal
mostly occurs with
and
'
0.9, that is,
= 67 and
rmax = 2.25 Å. The average magnitude of the effective step size S (Eq. 1) obtained in the simulations with these values of
and
rmax is
2.86° ± 0.13°.
In the simulations executed with the Thrashing algorithm where the effect of the excluded volume was considered in the same way as in the LMProt algorithm, it was not possible to obtain rmsd values lower than 6.0 Å for both proteins after 105 MC steps, independent of the values of 
employed. The retention of the system in particular configurations was observed in all the cases.
Comparative CPU times between the LMProt and Thrashing algorithms
In this topic, we consider the calculation details of the LMProt and Thrashing algorithms to establish a general relation between the CPU time involved in each algorithm during one MC step, for a chain with n residues. Let us first consider the Thrashing method. A possible new configuration is generated whenever a specific dihedral angle, located at a particular residue, is perturbed. In this way, because there are two distinct dihedral angles per residue (
and
), with the exception of proline residues, a number of (2n np) configurations is obtained after each MC step, np being the number of prolines in the chain. Then, assuming that a real CPU time
T is necessary for each new configuration generated, an average time equivalent to 2n
T is spent in each MC step (considering np = 0). However, a valid new configuration must pass through the atomic overlap checking and for this, let us assume an average number
of atoms per residue. Therefore, the atomic overlap checking for two distinct residues involves
atom pairs. However, for one perturbed dihedral angle, say, at residue "i", exactly i(n i) residue pairs must be checked for overlapping. This implies that a total number of i(n i)
atom pairs are involved. Summing two times over all residues to complete one MC step, that is
one gets that about
atom pairs must be checked. Now, if a real CPU time
cpu is necessary for checking a single atom pair, in the case of the Thrashing algorithm a total average time
is spent for each MC step.
Clearly, this time can be significantly reduced by improved methods as the "residue neighborhood list technique", that is, a list containing nlist neighbors is dynamically updated for each residue. Therefore, for this case,
assuming that n is even, and so
Now, let us turn to the LMProt algorithm. In this case, a candidate for a new configuration is obtained after randomly moving
sequential residues. If a CPU time
is spent to move all
atoms of
residues, then an average time equivalent to
is spent on each MC step. Again, this try-configuration must pass through the atomic overlap checking, and in this case there are
atom pairs to check, each one spending a CPU time equal to
cpu. Therefore, for the LMProt algorithm, a total average time
is spent on each MC step. Otherwise, considering the method of analysis of neighbors, we have
and
Then, for case B in particular, whose performance for both algorithms is better if compared to case A, the ratio
between the CPU time of the two algorithms is given by
 | (9) |
where
T and
are specific for the Thrashing and LMProt algorithms, respectively, but
cpu is the same for both. Now, using specific values for
= 3, 4, ... 8 and the respective
from Table 2 (tolerance of 1/2 and 1%),
T = 24 µs,
cpu = 3.4 µs (Pentium IV 2.4 GHz; Linux-OS; Intel Fortran, Santa Clara, CA), and considering
(list of neighbor atoms), one can determine the ratio f for different n values (Eq. 9), as shown in Fig. 7. Note that for
the ratio f is >1 for both tolerances and f is >10 for
with
6. Note that for the same n and
, f increases when the tolerance
is
6. When only the overlapping check for protein 1BFF using the previously defined
ideal value is considered, for example, the performance of LMProt is about three times higher than that of the Thrashing algorithm.
 |
DISCUSSION
|
|---|
The results obtained by the LMProt algorithm for multiple and single local moves is compared against other sampling Monte Carlo techniques. As it has been pointed out in previous works (Cahill et al., 2002
), our results also emphasize the importance of the way the phase space is explored by simultaneous movements involving chain segments composed by multiple groups. Comparisons between LMProt and other algorithms show that the convergence time (which is proportional to the number of MC steps required to reach conformations the closest possible to the native structure) is significantly shorter for the LMProt method, as shown in Fig. 5. Indeed, it is up to three orders of magnitude faster than the Thrashing (nonlocal) algorithm and, depending on the accuracy required, it is at least 20 times faster than the Sevenfold Way (local) algorithm, as shown by the inset of Fig. 5.
The performance difference between the nonlocal and local algorithms, in the case of phantom chains, can be easily understood. For this, let us consider a specific chain configuration constituted of N atoms; a new configuration is then tried by simultaneously moving m of them. It is straightforward to see that as m increases the chance to obtain a new optimized configuration (lower energy, or smaller rmsd) becomes exponentially smaller. It is obvious that, for real chains, extra constraints make the situation even worse. Therefore, the importance of the "flexible chain idea" follows, which permits a tolerance on the geometric constraints of the chain, also affecting the reduction of ergodic problems. Indeed, the set of MC moves must be achieved in such a way that any phase space point, i, has a chance p > 0 to be accessed from any other state, j, thus guaranteeing the ergodic hypothesis. In a matrix notation, this hypothesis is expressed as
for all i and j, being k a value that can depend on i and j, and A the matrix transition (Manousiouthakis and Deem, 1999
). In a Markovian process, the number of required states to pass from i to j can, however, depend on the type of treatment applied. Obviously, if k is a very large number, the search can be considered nonergodic for practical purposes, and so one has to find a way to reduce k. At this point is the virtue of the "flexible-chain idea," which effectively permits one to reduce k.
A relevant factor that controls the efficiency of the LMProt algorithm is the adjustment of the movement parameters
and
rmax. As shown in Fig. 6, the folding success
and the folding speed
' are significantly affected by the number
of residues involved in the try configuration and by the amplitude of the atomic displacements
rmax. Structures like
-helix are much less sensitive to these parameters than ß-sheets, as shown by their corresponding convergence rates: protein 5NLL consists mostly of
-helixes and 1BFF of ß-sheets. Indeed, the LMProt method is generally faster for
-helix than for the ß-sheet type structure, which corroborates with experimental and theoretical results that characterize the folding times for such structural classes (Kauzmann, 1959
). Therefore, for proteins presenting larger diversity of folds or tertiary contacts,
and
max should be properly balanced to optimize the algorithm; the choice, for example, of a too-small amplitude
rmax (to favor
-helix structures; Cahill et. al., 2003
), may compromise the configurational search for other structural patterns.
Another important aspect of the LMProt algorithm is the possibility of effectively controlling the magnitude of the atomic displacements at different stages of the folding progress. For instance, after chain collapse, smaller structural changes can be crucial to increase the acceptance rate, contrasting with an open chain, where small displacements can produce a very slow search. However, at any specific instant along the simulation, the chain may present a nonuniform compactness (even a distinct class of secondary structure may present different compactness). So, instead of having a fixed displacement amplitude, one may think about another kind of amplitude distribution, as it was demonstrated to be useful in permitting the parameter
to fluctuate between two limits. These aspects of the algorithm performance become even more significant if different temperatures have to be considered. Of course, the temperature determines the energy levels occupation of the different degrees of freedom of the system, and so one may think that the
and
parameters should also depend on the temperature. Therefore, it is also easy to understand that the LMProt algorithm is able to describe the physical nature of the systems by mimicking the system dynamics at the molecular level.
An additional advantage of the LMProt is the reduction of the CPU time of the simulations. In MC simulations the main consumption of CPU time is due to checks of overlaps between atoms. LMProt has a CPU time that is about three times shorter than that of the Thrashing algorithm (Fig. 7), for proteins with
100 residues. If the energy calculation is also considered, this difference can be increased even more.
The generation of configurations by local moves requires changes of variables to recover the geometric constraints of the system after each set of proper moves. Therefore, the determinants of the set of variable transformations (Jacobians) must be calculated for both states involved: the original and the new one, to properly weigh up each new configuration. Such Jacobian depends only on the equations of constraints, and so if m atoms are moved, 3m constraints are required (Pant and Theodorou, 1995
). However, in generating new configurations, LMProt uses a smaller number of constraints, because the number of solutions is always >1 for a local move. Nevertheless, fictitious constraints can be created in a way that the set of equations of constraints results in a unique solution, the new configuration itself already determined by LMProt. In the example of Fig. 1, once a particular solution involving nine atoms is determined, 25 constraints are required. The calculation of the Jacobians, on the other hand, requires 27 constraints for each generated configuration. The choice of two fictitious constraints, in this example, can directly be made between atoms NjNk and CjCk of Fig. 1. In a general way, it was verified that these Jacobians are directly obtained by the LMProt method without significant extra computational cost. In the next work, applications of the LMProt algorithm for MC simulations in which the detailed balance is focused will be presented.
 |
CONCLUSION
|
|---|
In short, this work presents an efficient MC simulation algorithm to generate protein-like chain configurations. It uses the "flexible chain idea" and new configurations are generated by multiple residue moves to mimic a local molecular movement. The parameters that control the moves can be adjusted to optimize the efficiency of the method for MC distinct applications. It was shown that the number of moved residues and the atomic displacement amplitude of the moves, when adequately combined, can speed up the process for finding the native structure. The algorithm presented here is up to several orders of magnitude faster than other recently developed and published algorithms. Its application can be extended to several other problems involving configurational changes ranging from linear polymers to proteins. Some advantages of this algorithm are clearly linked to its ability in reproducing peculiar physical aspects of the macromolecular systems. The availability of a highly efficient algorithm able to generate configurations of proteins is a key point in the protein-folding problem approach by means of general MC technique-based energy criterion.
 |
SUPPLEMENTARY MATERIAL
|
|---|
An online supplement to this article can be found by visiting BJ Online at http://www.biophysj.org.
 |
ACKNOWLEDGEMENTS
|
|---|
We thank Marco Antônio Alves da Silva, from the Department of Physics and Chemistry of Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Brazil for the discussions during the development of this work.
We also thank Fundaç ão de Amparo à Pesquisa do Estado de São Paulo (FAPESP, Proc. 01/13970-0) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for financial support.
Submitted on February 12, 2004;
accepted for publication June 2, 2004.
 |
REFERENCES
|
|---|
Aièllo, O. E., and M. A. A. da Silva. 2003. New approach to dynamical Monte Carlo methods: application to a epidemic model. Physica A. 327:525534.[CrossRef]
Berg, B. A., and T. Neuhaus. 1991. Multicanonical algorithms for 1st order phase-transitions. Phys. Lett. B. 267:249253.[CrossRef]
Berne, B. J., and J. E. Straub. 1997. Novel methods of sampling phase space in the simulation of biological systems. Curr. Opin. Struct. Biol. 7:181189.[CrossRef][Medline]
Bernstein, B. E., D. M. Williams, J. C. Bressi, P. Kuhn, M. H. Gelb, G. M. Blackburn, and W. G. Hol. 1998. A bisubstrate analog induces unexpected conformational changes in phosphoglycerate kinase from trypanosoma brucei. J. Mol. Biol. 279:11371148.[CrossRef][Medline]
Binder, K., and A. Baumgartner. 1992. Monte Carlo method in condensed matter physics. In Topics in Applied Physics, Vol. 71, 2nd Ed. K. Binder, editor. Springer-Verlag, New York, NY.
Cahill, M., S. Cahill, and K. Cahill. 2002. Proteins wriggle. Biophys. J. 82:26652670.[Abstract/Free Full Text]
Cahill, S., M. Cahill, and K. Cahill. 2003. On the kinematics of protein folding. J. Comput. Chem. 24:13641370.[CrossRef][Medline]
Degrève, L., and A. Caliri. 1995. Geometric constraints in polymer chains: analysis on the pearl-necklace model by Monte Carlo simulation. J. Mol. Struct. 335:123127.
de Pablo, J. J., M. Laso, and W. U. Suter. 1992. Simulation of polyethylene above and below the melting-point. J. Chem. Phys. 96:23952403.[CrossRef]
Dodd, L. R., T. D. Boone, and D. N. Theodorou. 1993. The concerted rotation algorithm for atomistic Monte Carlo simulation of polymer melts and glasses. Mol. Phys. 78:961996.[CrossRef]
Favrin, G., A. Irbäck, and F. Sjunnesson. 2001. Monte Carlo update for chain molecules: biased Gaussian steps in torsional space. J. Chem. Phys. 114:81548158.[CrossRef]
Go, N., and H. A. Scheraga. 1970. Calculation of conformation of pentapeptide cyclo (glycylglycylglycylprolylprolyl).2. statistical weights. Macromolecules. 3:178187.[CrossRef]
Hansmann, U. H. E., and Y. Okamoto. 1993. Prediction of peptide conformation by multicanonical algorithm: new approach to the multiple-minima problem. J. Comput. Chem. 14:13331338.[CrossRef]
Kastrup, J. S., E. S. Eriksson, H. Dalboge, and H. Flodgaard. 1997. X-ray structure of the 154-amino-acid form of recombinant human basic fibroblast growth factor. comparison with the truncated 146-amino-acid form. Acta Crystallogra. D. 53:160168.[CrossRef]
Kauzmann, W. 1959. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14:163.[Medline]
Lal, M. 1969. Monte Carlo computer simulation of chain molecules I. Mol. Phys. 17:5764.[Medline]
Ludwig, M. L., K. A. Pattridge, A. L. Metzger, M. Dixon, M. Eren, Y. Feng, and R. Swenson. 1997. Control of oxidation-reduction potentials in flavodoxin from Clostridium beijerinckii: the role of conformation changes. Biochemistry. 36:12591280.[CrossRef][Medline]
Lyubartsev, A. P., A. A. Martsinovski, S. V. Shevkunov, and P. N. Vorontsov-Velyaminov. 1992. New approach to Monte Carlo calculation of the free-energy-method of expanded ensembles. J. Chem. Phys. 96:17761783.[CrossRef]
Madras, N., and A. D. Sokal. 1988. The pivot algorithm: a highly efficient Monte Carlo method for the self-avoiding walk. J. Stat. Phys. 50:109186.[CrossRef]
Manousiouthakis, V. I., and M. W. Deem. 1999. Strict detailed balance is unnecessary in Monte Carlo simulation. J. Chem. Phys. 110:27532756.[CrossRef]
Marinari, E., and G. Parisi. 1992. Simulated tempering: a new Monte Carlo scheme. Europhys. Lett. 19:451458.[CrossRef]
McGreevy, R. L., and L. Pusztai. 1988. Reverse Monte Carlo simulation: a new technique for the determination of disordered structures. Mol. Simul. 1:359367.[CrossRef]
Moret, M. A., P. M. Bisch, and K. C. Mundim. 2002. New stochastic strategy to analyze helix folding. Biophys. J. 82:11231132.[Abstract/Free Full