| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


¶





* Chemistry Department, Molecular Biology & Biochemistry Department, and Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut 06459 USA;
Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037 USA;
Departments of Medicinal Chemistry and of Pharmaceutics and Pharmaceutical Chemistry, University of Utah, Salt Lake City, Utah 84112-5820 USA;
Laboratoire de Biochimie Theorique, Institut de Biologie Physico-Chimique, Paris 75005, France; ¶ Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029 USA; || J. Heyrovsky Institute and Center for Complex Molecular Systems and Biomolecules, 182 23 Prague, Czech Republic; ** Institute of Mathematics B, Swiss Federal Institute of Technology, CH 1015 Lausanne, Switzerland; 
Theoretical Biophysics Group, Max Delbrück Center, D-13122 Berlin, Germany; and 
Molecular and Cell Biology, University of California-Berkeley, Berkeley California 94720-3202 USA
Correspondence: Address reprint requests to David L. Beveridge, E-mail: dbeveridge{at}wesleyan.edu.
| ABSTRACT |
|---|
|
|
|---|
0.6 µs of simulation for systems containing
24,000 atoms. The resulting trajectories involve 600,000 coordinate sets and represent
400 gigabytes of data. In this article, the research design, details of the simulation protocol, informatics issues, and the organization of the results into a web-accessible database are described. Preliminary results from 15-ns MD trajectories are presented for the d(CpG) step in its 10 unique sequence contexts, and issues of stability and convergence, the extent of quasiergodic problems, and the possibility of long-lived conformational substates are discussed. | INTRODUCTION |
|---|
|
|
|---|
Overall, we seek to obtain MD trajectories for the 136 unique DNA tetranucleotides imbedded in 39 DNA oligomers having repeating sequences. The oligomers are each 15 nucleotide pairs in length and are capped on both ends by GC pairs. All MD simulations were performed with a consensus protocol using the AMBER suite of programs (Case et al., 1999
) and the parm94 force field of Cornell et al. (1995)
, which has been verified in test cases to produce good overall agreement between calculated and observed DNA structures in crystals and in solution (Arthanari et al., 2003
; Bevan et al., 2000
). MD trajectories of 15 ns have been obtained for each of the 39 cases. From these simulations, the MD protocol, convergence and stability issues, and quasiergodic problems due to substate sampling are assessed. In this article we present the research design of ABC, details of the simulation protocol, considerations on informatics and database issues, and results on the sequence context problem in d(CpG) steps. The development of a prototype web-accessible relational database for public dissemination of the results is described.
| BACKGROUND |
|---|
|
|
|---|
Questions have been raised since early studies of sequence effects as to whether dinucleotide step information is a sufficient basis for a description of sequence effects in oligomeric or polymeric DNA. The structure of any individual XpY step may clearly be subject to sequence context presented by the nearest neighboring basepairs. If such effects are important, the minimum monomeric unit necessary to describe the details of DNA structure would be tetranucleotide steps, of which there are 136 unique permutations. Evidence of higher-order cooperative behavior in DNA structure suggests that in some cases even tetranucleotide steps may be insufficient to fully characterize the system. However, for a systematic approach, the nature of the sequence-effect problem at the level of tetranucleotide steps needs to be fully examined.
The immediate issue is thus first-neighbor context effects on the structures of DNA dinucleotide steps, which requires knowledge of the structures of all 136 unique tetranucleotides. Crystal structures of DNA oligonucleotides serve as the primary source of data, the basis of a number of studies of the problem to date as described in research articles (El Hassan and Calladine, 1995
; Olson et al., 1998
; Suzuki et al., 1997
), review articles (Neidle, 1999
; Olson and Zhurkin, 1996
), and texts (Calladine and Drew, 1997
; Neidle, 2002
; Saenger, 1984
; Sinden, 1994
). Even at the dinucleotide step level, the crystal structures present an uneven distribution of instances of each step, and are heavily biased toward cases with G's and C's. Issues with respect to the influence of packing effects and crystal imperfections have also been noted (Dickerson et al., 1987
). In particular, sequence-dependent axis curvature of DNA is clearly sensitive to packing effects (DiGabriele et al., 1989
; Johansson et al., 2000
; Shakked et al., 1989
). The determination of DNA structure in solution by NMR spectroscopy has been limited by the lack of tertiary contacts and the short-range nature of scalar couplings and NOE data. New NMR experiments based on residual dipolar coupling (RDC) hold the possibility of obtaining higher-resolution structures of oligonucleotides in solution (Vermulen et al., 2000
) and may have sufficiently high resolution to accurately resolve DNA structure, but are just beginning to appear in the literature (Barbic et al., 2003
; MacDonald and Lu, 2002
; Tjandra et al., 2000
). Another line of investigation has been to derive basepair step parameters empirically or semiempirically from experiment (Bolshoy et al., 1991
; Liu and Beveridge, 2001
). However, various dinucleotide step models give essentially similar accounts of the observed data within statistical uncertainty (Liu and Beveridge, 2001
).
Two sets of structural indices based on trinucleotide steps have been derived from nucleosome positioning and DNase digestion (Brukner et al., 1995a
,b
; Satchwell et al., 1986
). Both sets of results indicate significant context effects for dinucleotide steps, but the rankings do not correlate well with each other and likely index different aspects of sequence-dependent structural deformation and/or deformability. Kanhere and Bansal (2003)
have reexamined this issue and indicate that trinucleotide scales do not lead to a good account of all the observed data on sequence-dependent curvature. At the tetranucleotide step level, the crystallographic database is still very sparse. Surveys of this data have raised the possibility of quite significant sequence effects. The most extensive theoretical consideration of the problem to date is due to Packer et al. (2000a
,b
), who presented detailed considerations based on the minimization of stacking energies for tetranucleotide steps as described by empirical energy functions.
Recently, all-atom molecular modeling of DNA structure via molecular dynamics simulations including explicit solvent (water molecules and mobile salt ions) and based on interactions described by empirical force fields, has reached a level at which accurate dynamical models of DNA structure in solution have been obtained. Various aspects of the field of MD simulations of DNA have been described in recent review articles (Beveridge and McConnell, 2000
; Cheatham and Kollman, 2000
; Giudice and Lavery, 2002
; Norberg and Nilsson, 2002
; Orozco et al., 2003
). The simulation protocols employed by different groups are now reasonably uniform. The problem of long-range interactions is seemingly stabilized with the advent of the particle mesh Ewald (PME) method (Essmann et al., 1995
) for periodic boundary conditions, despite lingering concerns about long-range correlations (Hunenberger and McCammon, 1999
; Smith and Pettitt, 1996
). The energy functions incorporated in the suites of programs readily available for MD simulation such as AMBER (Case et al., 1999
), CHARMM (Brooks et al., 1983
), and GROMOS (Scott et al., 1999
) each contain a full set of parameters for nucleic acids.
The AMBER parm94 nucleic acids force field as described by Cornell et al. (1995)
is a reparameterization for MD with explicit solvent, and is termed "second generation". MD using AMBER and parm94 provided the first well-behaved MD trajectories of the DNA double helix (Cheatham et al., 1995
, 1998
; York et al., 1995
; Young et al. 1997a
,b
). Known shortcomings in parm94 still include a sensitive problem in the coupling of base-sugar torsions and a systematic tendency toward slightly underwound structures. A modification known as parm99 has recently been proposed (Cheatham et al., 1999
), which improves twist but appears less sensitive to changes in the environment (high salt, ethanol). The CHARMM force field for nucleic acids, as refined by MacKerell and co-workers (Foloppe and MacKerell, 2000
; MacKerell and Banavali, 2000
; MacKerell et al., 2000
); and also the hybrid AMBER/CHARMM force field by Langley (1996
, 1998
), provide viable alternatives for MD on nucleic acids and also show good agreement with experiment. Comparative studies on force fields for nucleic acids have been described by Feig and Pettit (1998)
, Reddy et al. (2003)
, and Cheatham and Young (2001)
.
Although the present study is aimed at creating a well-defined computational vantage point on the problem of sequence effects on DNA structure, the project design allows us to address several important additional and timely methodological questions about MD on DNA oligonucleotides. Some of the principle concerns are: a), when is a simulation "converged"; b), what length of trajectory is "enough"; c), how sensitive are the results to the choice of initial configuration; and d), what are the meaningful ways and pitfalls in extracting "structures" from an MD trajectory and analyzing them? Questions a and b are in fact "moving targets" with no definitive answer. Convergence can never be unequivocally proved because there is no guarantee that the past behavior of a system in a simulation is predictive of the future; one may in principle always encounter new substates of a system with more extensive sampling or new modes of motion that have a slower relaxation time. One must deal with this pragmatically, running simulations for as long as possible and checking on the stability of diverse indices of dynamical structure as a function of time. Each property or structural index exhibits a characteristic time evolution in MD, and some have a shorter relaxation time and will be quicker to stabilize than others. Studies on a prototype B-form dodecamer (Ponomarev et al., 2004
) indicate that DNA conformational and helicoidal parameters, have relaxation times of <500 ps. The rule of thumb is to sample for 10 times the relaxation time of all the indices of interest for a particular application (Haile, 1992
). This indicates that 5-ns trajectories should be sufficient in the absence of substate problems (see below), and we are well in excess of that in the 15-ns trajectories carried out in phase I of this project. Observed diffusion constants indicate that motions of mobile counterions will be relatively slow to converge (Varnai and Zakrzewska, 2004
). Ponomarev et al. (2004)
have found in a prototype study that ion occupancies may take up to 100 ns to stabilize. However, in the same calculation, the DNA parameters were found to be well stabilized at 5 ns, and not sensitive to the fine details of ion convergence. The calculated DNA counterion radial distribution functions were found to be essentially unchanged after 35 ns, indicating that mean field effects of ions are dominant in DNA structure and that the excess sampling to get ion occupancies converged is a matter of granularity of the ion distributions.
What is referred to as the "substate problem" in macromolecular simulation is a quasiergodic issue. A flexible macromolecule has the potential for contributions from a manifold of thermally accessible substates, with DNA being particularly susceptible (McConnell et al., 1994
; Poncin et al., 1992
). Known examples of this are the BI-BII transitions (Hartmann et al., 1993
) and
/
-crankshaft motions (Varnai et al., 2002
), and YpR hinge motions (Calladine and Drew, 1997
). The latter have been noted to play an important role in structures of protein-bound DNA (Dickerson and Chiu, 1997
) as well as DNA curvature (Beveridge et al., 2004
). Indications from the crystallographic database are that certain basepair steps show high flexibility (El Hassan and Calladine, 1995
) whereas those involved in A-tracts are relatively rigid (Young et al., 1995
). The problem this poses to a simulation arises from the need to sample all thermally accessible substates adequately to obtain an ensemble of snapshots that properly represent the dynamical structure of the DNA. This requires additional sampling, which is numerically impeded when the paths between substates are narrow cols on the potential energy hypersurface and thus infrequent occurrences. However in examining this class of problems, computational modeling via MD has a unique vantage point, because a molecular level account of structure as a function of time can probably never be obtained experimentally. The substate issue calls attention to another problem in defining structure, because for a system with substates the idea of a single overall average structure of the system being representative of the dynamical structure of the system is challenged. For example for a system in a symmetric double minimum potential in which both states are thermally accessible, the average structure would have the least probability of occurrence in the ensemble. In this case the analysis should be based on the structures of substates and their respective statistical weights, i.e., the dynamical structure of the system.
In this article, the research design, details of the simulation protocol, informatics issues, and the organization of the results into a relational database are described. Preliminary results concerning MD convergence issues and structural analyses after 15 ns of MD are presented, focusing on the d(CpG) step in its 10 unique sequence contexts. The d(CpG) step was chosen for preliminary analysis as a case in which x-ray structures indicate a potential for context-dependent substates (Calladine and Drew, 1997
; El Hassan and Calladine, 1995
). The extent to which any step has the potential for substates of any kind with differential stability sensitive to context effects will be an important issue in understanding DNA curvature and ligand-induced bending with substantial implications with respect to protein-DNA recognition.
| METHODOLOGY |
|---|
|
|
|---|
All simulations have been carried out using the AMBER 6 or AMBER 7 suite of programs (Case et al., 1999
) and the parm94 force field (Cornell et al., 1995
). The simulations cover 39 double-stranded DNA oligomers, each being 15 basepairs in length. The sequences of these oligomers are discussed below. A consensus protocol was adopted for simulation in which the solute molecule is a 15-basepair oligonucleotide with 28 potassium ions added to achieve system electroneutrality. The DNA-ion complex is simulated in a truncated octahedral box having a face-to-face dimension of
70 Å, which allows for a solvent shell extending for at least 10 Å around the DNA. The starting configuration has the oligomer in a canonical B form. The ions are randomly placed around the oligomer, and located at least 5 Å from any atom of the solute and at least 3.5 Å from one another in the initial structure. Ion interaction with other atoms are based on the potentials developed by Aqvist (1990)
. The neutral ion-oligomer complex is solvated with a layer of TIP3P water molecules (Jorgensen, 1981
). Simulations are performed with periodic boundary conditions in which the central cell box contains
8000 water molecules. Considering the DNA, counterions, and solvent water, the total system consists of
24,000 atoms.
The preparations for MD simulations consists of an initial minimization followed by slow heating to 300 K at constant volume over a period of 100 ps using harmonic restraints of 25 kcal/mol/Å2 on the solute atoms. These restraints are slowly relaxed from 5 to 1 kcal/mol/Å2 during a series of five segments of 1000 steps of energy minimization and 50-ps equilibration using constant temperature (300 K) and pressure (1 bar) conditions via the Berendsen algorithm (Berendsen et al., 1984
) with a coupling constant of 0.2 ps for both parameters. The final segments consists of 50-ps equilibration with a restraint of 0.5 kcal/mol/Å2 and 50-ps unrestrained equilibration. The simulations were then continued for a total of 15 ns at constant temperature and pressure conditions, using the Berendsen algorithm (Berendsen et al., 1984
) with a coupling constant of 5 ps for both parameters. Electrostatic interactions were treated using the particle mesh Ewald (PME) algorithm (Essmann et al., 1995
) with a real space cutoff of 9 Å, cubic B-spline interpolation onto the charge grid with a spacing of
1 Å. SHAKE constraints (Ryckaert et al., 1977
) were applied to all bonds involving hydrogen atoms. The integration time step was 2 fs. Center-of-mass translational motion was removed every 5000 MD steps to avoid the methodological problems described by Harvey et al. (1998)
. The trajectories were extended, as noted above, to 15 ns for each oligomer and conformations of the system were saved every 1 ps for further analysis.
In an effort to be objective about the convergence of MD simulations on DNA, it was agreed that first generation ABC study would involve 15 ns of simulation for each of the 39 oligomers, pooling the results and performing detailed analysis. This is at the high end of typical run lengths used in current published research. Each group was assigned responsibility for dealing with MD on four or five oligomers. Although no special resources were requested for carrying out these simulations, the entire data set was obtained in roughly three months on a heterogeneous mix of high-performance supercomputers and PC clusters. It should be stressed that this represents a considerable computational task, corresponding to a total of
0.6 µs of simulation for systems containing
24,000 atoms. The resulting trajectories involve 600,000 coordinate sets and represent roughly 400 gigabytes of data.
Oligonucleotide sequences
A key element of our research design is that, rather than performing calculations on all 136 tetranucleotides using 136 different oligomers (for example, placing each tetranucleotide within a longer duplex, surrounded with some standard sequence), we carried out the calculations on oligomers with repeating tetranucleotide sequences (ABCDABCDABCD...); cf. Table 1. In this way, each oligomer can contain up to four distinct tetranucleotides. Thus moving a four-base "reading frame" along the oligomer, we locate successively ABCD, BCDA, CDAB, and DABC tetranucleotides. As shown in Table 2, this strategy enables all 136 tetranucleotides to be studied using only 39 oligomers. As concerns the length of the oligomers, 15 basepairs was chosen as a compromise between the necessity to avoid end effects and the computational expense of the simulations. Based on prior experience, it was also decided to cap the ends of each oligomer with a single GC pair to avoid fraying. This implies that a given 15-basepair oligomer contains three tetranucleotide repeats 5'-G-D-ABCD-ABCD-ABCD-G-3'. This choice means that if we decide to ignore two basepairs at either end of the oligomer, to avoid potential artifacts from end effects, there will still be two distinct copies of each unique tetranucleotide (ABCD, BCDA, CDAB, DABC) within the remaining 11-basepair fragment. Thus with MD on 39 of these oligonucleotides, we will be able to compare the properties of two copies of each tetranucleotide as a further convergence test of the simulations. To ensure that all participating groups were using strictly identical simulation protocols, standard scripts were made available via the group website. A standard naming convention was also adopted for all files generated during the simulations, to facilitate the exchange of data between the participating groups and to simplify setting up an overall database of the results.
|
|
In the course of this project, a relational database was developed that simplifies and speeds up the task of querying this common repository for trajectory information about any of the simulations, or for comparing the parameters from the different simulations for characteristics of various subsets of the nucleic acid segments. The information extracted from the CURVES analyses are stored in the database as tables for the complete trajectory indexed with various identifications, defined on the basis of the simulation, the nucleotide position, the time step in the simulation, etc. Using the processing power of a structured query language (SQL) allows complex queries into the database. Thus a component of this project is a bioinformatics initiative aimed at the structured storage and handling of the results from large and numerous molecular simulations, which simplifies many of the technical complexities associated with the management and analysis of such large quantities of information.
The ABC database will be made accessible to the interested research community outside the consortium through the internet as soon as it is complete and fully tested. This interface provides a dynamic access to the simulation results harnessing the strength of SQL, limited only by the html interface. This system will permit queries executed from the web to extract the average helicoidal parameter values and their standard deviations over different time periods of the various trajectories, and enable comparisons between the various simulations on different sequences or in different user-defined conditions. The results can be either viewed as tables or displayed graphically in the web browser. The nature of queries that can be carried out from the web interface include viewing the mean and standard deviation profile of all the helicoidal parameters as a function of sequence in each of the trajectories and more elaborate searches such as comparing the statistical properties of any of the DNA structural parameter for a central basepair in all its relevant tetranucleotide combinations.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
![]() |
By design, each of the ABC oligomers contains a minimum of two copies of each unique tetranucleotide, placed at least two basepairs from the ends of the oligomer. This gives us the chance of making internal comparisons of the structural and dynamic characteristics. It should be noted that these two copies are not necessarily symmetrically placed with respect to the center of the oligomer (e.g., G-CGTACGTACGTAC-G, where the bold T marks the central basepair and the two copies of the ACGT tetranucleotides are underlined).
We begin by looking at the overall stability of the conformation of the oligomers used in this study. The typical low-resolution test is to examine the evolution of the structure during the trajectory by measuring the root-mean-square deviation (RMSD) with respect to a canonical B-form initial structure, an A-form reference state, or to the average structure reached at the end of the trajectory. Fig. 1 illustrates this test for the oligomer containing the ACGT tetranucleotide cited in the previous paragraph (hereafter referred to as the "ACGT oligomer"). Comparisons with B-DNA, A-DNA, and the average conformation of the oligomer calculated over the last 5 ns of the trajectory all suggest that the overall structure of the oligomer rapidly stabilizes to a putatively stable state, and remains there, in RMSD fluctuations of
1 Å, until the termination of the trajectory. This conformation is seen to lie at
4 Å from the canonical B conformation and
5 Å from canonical A-DNA. This global conformation is situated between the B and A forms as evidenced by high roll, negative slide, and lower rise and twist values than in canonical B-DNA. Examination of the helical and the backbone parameters provides more details about the dynamical structure. We will begin with the helical parameters and, in particular, with the interbasepair parameters relating to the CpG step. Fig. 2 shows time series for two important parameters, basepair rise and twist of the C10pG11 step within the ACGT oligomer. In each case, there are important oscillations on timescales ranging from a few picoseconds to several nanoseconds. Instantaneous values of both rise (1.6 Å
5.8 Å) and twist (8°
52°) cover ranges that are considerably greater than those seen in the crystallographic structures of B-DNA oligomers, even if CpG steps were originally classified as particularly flexible on the basis of such experimental data. The average values measured over the last 10 ns of the trajectory are 3.5 Å for rise and 32.6° for twist, with mean ± SD of 0.6 Å and 5.6°, respectively. Again, as in the case of the global helical conformation, stability as a function of time seems to have been achieved.
|
|
0.5 Å (Packer et al., 2000b
(C3'-O3') and
(O3'-P) have secondary populations in g and t states, respectively, within both backbones, corresponding to a small percentage of time spent in the BII state.
|
|
1 Å in translation and
20° in twist. These differences are an order of magnitude greater than those seen within the ACGT oligomer. How can this striking difference be explained?
|
|
(P-O5') and
(C5'-C4') dihedral distributions are unusual, with
spending most of the trajectory in the g+ state, rather than the usual g state and
being t rather than the usual g+. These changes also affect the other backbone angles of the junction, leading to a shift of ß (O5'-C5') from t toward g and a broad
distribution centered around 110°. Note that there are no such changes in the 3'-GpC steps of the second strand (G25pC26 and G21pC22) of the oligomer, although there is a shift in the 
equilibrium to almost equally populated BI and BII states and a broadening of the ß-distribution.
|
|

-flip from the normal gg+ state to a g+t state indeed affects the rise and twist of the adjacent 5' step. The results indicate that the 
-flip on the 3' side of C10pG11 occurs after almost 4 ns of simulation and that the rise and twist of this step then drops sharply after a delay of roughly 1 ns. Similar coupling has been observed for other CpG steps that exhibit exceptionally low rise and twist values in the series of oligomers studied here.
|
BII transitions, which are both relatively rare and short lived for the tetranucleotides presently investigated, it appears that 
-flips can persist for at least the 15-ns trajectory that has been carried out in the first round of ABC simulations. This is consistent with the larger barriers that separate the gg+ and g+t states revealed by Varnai et al. (2002)
-flips. The question of whether this is a viable component of a dynamical model or a force-field artifact will need to be investigated further with longer trajectories, noting that 
-flips are observed in DNA sequence complexed to proteins but in few cases otherwise. If we now want to look provisionally at sequence effects on the CpG step, we have to deal with perturbations caused by 3' flanking 
-flips. Using these results, the only alternative is to filter out those parts of the trajectories associated with the unusual 
-states as artifact and analyze the remainder of the trajectory. Naturally, this means that the statistical quality of the data for some tetranucleotides may be reduced, or even that, in some cases, we may actually have no data left to analyze. This filtering has been applied in the results on d(CpG) contained in Table 3 and affects the GCGG, TCGG, and TCGA tetranucleotides. Data for GCGC can, however, be recovered by using the steps present in this dinucleotide repeat oligomer that are not affected by the 
-flips. Table 3 contains the mean value and the standard deviation of the six interbasepair helical parameters for the 10 possible tetranucleotide environments of the CpG step, averaged over the last 10 ns of the trajectories.
|

-filtering, there are still individual parameters that show visible differences between the two nominally identical steps in the given oligomers. This is clearly the case for TCGA, where there are clear discrepancies in slide, roll, and twist, and for GCGC, which shows a discrepancy in slide. For the remaining steps the differences between the two tetranucleotide copies are generally <0.3 Å for translational parameters and <2° for rotational parameters. If we assume that such variations limit the precision of the parameters resulting from these simulations, then there are surprisingly few visible context effects on the six interbasepair parameters for the CpG. Shift and tilt need not be analyzed because their values are very small for all the CpG steps. For the remaining translational parameters, we can cite low rise for the CCGG and GCGG steps (although data on only one nucleotide are available in the latter case), higher rise in the RCGN steps, and somewhat lesser slide for GCGA. For rotational parameters, low twists occur for GCGG (one data point), GCGA, CCGG, and TCGG, whereas the RCGY steps (ACGT, GCGC, GCGT) have rather higher values. If we now look at the standard deviations of the interbasepair parameters, the situation can be rapidly summarized by saying that only minor context effects are visible in the MD modeling of the d(CpG) step. The limited crystal structure data that are currently available do not help in confirming or refuting this observation, although the idea of sequence-directed structural properties relies on the existence of such context effects. More detailed analysis of the available data is required to decisively conclude on the capabilities of the present level of approximations employed in molecular dynamics simulations of DNA to address this issue. | CONCLUSIONS |
|---|
|
|
|---|
The preliminary analysis offered here for the CpG step anticipates the issues for other steps and at least some of the problems involved and issues to be considered. However, before drawing any general conclusions from this first phase of the ABC initiative and resulting database, it is clearly necessary to complete the analysis of all 136 unique tetranucleotides. The results, even in this preliminary state, indicate that the dynamics of DNA introduce significant effects that raise a cautionary flag with respect to studies based on MD on DNA with short trajectories. A fuller knowledge and better understanding of this is of course one of the objectives of this project. The subsequent analysis step, in itself, poses a challenging informatics problem, given that the combined trajectories represent roughly 0.6 µs of simulation and require almost 0.5 terabytes of storage. At this point all simulations from the initial phase of ABC are completed and analysis is underway. The data obtained will hopefully allow us to obtain an increasingly clear view of context effects, to better understand the importance of such phenomena as conformational substates, and also to define how end effects and length effects can influence the behavior of DNA fragments.
From this study, we hope to provide a benchmark of what can be expected from MD on DNA based on the parm94 empirical force field, and place subsequent applications of MD on a well-characterized theoretical basis. The sensitivity of these results to choice of force field is likewise interesting in pointing the way to appropriate improvements in functional forms or parameters. The sensitivity of the results to ionic strength is an additional important question to consider. To accomplish our task, it is quite possible that longer trajectories will be required. Given the experience gained to date by the ABC collaboration, it is reasonable to think of extending this computational effort by an order of magnitude if necessary.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Generous support for both these meetings is gratefully acknowledged. Prof. Beveridge acknowledges support for this research from the National Institute of General Medical Sciences (NIGMS) (grant no. GM37909) and the Keck Center for Integrative Genomics at Wesleyan University. The participation of Kelly M. Thayer in this project was supported by an NIGMS training grant in Molecular Biophysics to Wesleyan University (grant no. GM 08271). Dr. Gabriela Barreiro acknowledges CNPq/Brazil (Conselho Nacional de Desenvolvimento Científico e Tecnológico) for a postdoctoral fellowship. Supercomputer time was generously provided under the auspices of the Partnerships for Advanced Computational Infrastructure program on the facilities of the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign/Urbana. The contribution of Dr. Richard Lavery and co-workers was supported by grants from the Centre National de la Recherche Scientifique, France. Dr. Peter Varnai thanks the Wellcome Trust for an International Prize Travelling Research Postdoctoral Fellowship (grant reference 060078). Prof. Cheathem acknowledges computational time from a National Research Advisory Council allocation (MCA01S027) and friendly user time on computer hardware at the University of Kentucky, the Pittsburgh Supercomputing Center, and the NCSA. Prof Osman acknowledges support from National Cancer Institute grant CA 63317. Eleanore Seibert was supported by National Institutes of Health training grants GM08553 and CA78207. Prof. Cheatham also acknowledges support from the Center for High Performance Computing at the University of Utah and financial support from the National Science Foundation (CHE-0326027). Dr. Filip Lankas acknowledges the support provided by the Centre for Complex Molecular Systems and Biomolecules (LN00A032) financed by the Ministry of Education of the Czech Republic. Prof Maddocks acknowledges the support for this research provided by the Swiss National Science Foundation and via a research collaboration between the EPFL and Hewlett-Packard.
Submitted on May 20, 2004; accepted for publication August 3, 2004.
| REFERENCES |
|---|
|
|
|---|
Arthanari, H., K. J. McConnell, R. Beger, M. A. Young, D. L. Beveridge, and P. H. Bolton. 2003. Assessment of the molecular dynamics structure of DNA in solution based on calculated and observed NMR NOESY volumes and dihedral angles from scalar coupling constants. Biopolymers. 68:315.[CrossRef][Medline]
Barbic, A., D. P. Zimmer, and D. M. Crothers. 2003. Structural origins of adenine-tract bending. Proc. Natl. Acad. Sci. USA. 100:23692373.
Berendsen, H. J. C., J. P. M. Postma, W. F. van Gunsteren, and A. DiNola. 1984. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81:36843690.[CrossRef]
Bevan, D. R., L. Li, L. G. Pedersen, and T. A. Darden. 2000. Molecular dynamics simulations of the d(CCAACGTTGG)2 decamer: influence of the crystal environment. Biophys. J. 78:668682.
Beveridge, D. L., S. B. Dixit, G. Barreiro, and K. M. Thayer. 2004. Molecular dynamics simulations of DNA curvature and flexibility: dynamical aspects of helix phasing and premelting phenomena. Biopolymers. 73:380403.[CrossRef][Medline]
Beveridge, D. L., and K. J. McConnell. 2000. Nucleic acids: theory and computer simulation, Y2K. Curr. Opin. Struct. Biol. 10:182196.[CrossRef][Medline]
Bolshoy, A., P. McNamara, R. E. Harrington, and E. N. Trifonov. 1991. Curved DNA without A-A: experimental estimation of all 16 DNA wedge angles. Proc. Natl. Acad. Sci. USA. 88:23122316.
Brooks, B. R., R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. 1983. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4:187195.[CrossRef]
Brukner, I., R. Sanchez, D. Suck, and S. Pongor. 1995a. Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 14:18121818.[Medline]
Brukner, I., R. Sanchez, D. Suck, and S. Pongor. 1995b. Trinucleotide models for DNA bending propensity: comparison of models based on DNaseI digestion and nucleosome packaging data. J. Biomol. Struct. Dyn. 13:309317.[Medline]
Burkhoff, A. M., and T. D. Tullius. 1987. The unusual conformation adopted by the adenine tracts in kinetoplast DNA. Cell. 48:935943.[CrossRef][Medline]
Calladine, C. R., and H. R. Drew. 1997. Understanding DNA: The Molecule and How It Works. Academic Press, San Diego, CA.
Case, D. A., D. A. Pearlman, J. W. Caldwell, T. E. Cheatham III, W. S. Ross, C. Simmerling, T. Darden, K. M. Merz, R. V. Stanton, and A. Cheng. and others. 1999. AMBER: Version 6. Version 6.0. San Francisco: University of California.
Cheatham III, T. E., P. Cieplak, and P. A. Kollman. 1999. A modified version of the Cornell et al. force field with improved sugar pucker phases and helical repeat. J. Biomol. Struct. Dyn. 16:845862.[Medline]
Cheatham III, T. E., and P. A. Kollman. 2000. Molecular dynamics simulation of nucleic acids. Annu. Rev. Phys. Chem. 51:435471.[CrossRef][Medline]
Cheatham III, T. E., J. L. Miller, T. Fox, T. A. Darden, and P. A. Kollman. 1995. Molecular dynamics simulations on solvated biomolecular systems: the particle mesh Ewald method leads to stable trajectories of DNA, RNA, and proteins. J. Am. Chem. Soc. 117:41934194.[CrossRef]
Cheatham III, T. E., J. L. Miller, T. I. Spector, P. Cieplak, and P. A. Kollman. 1998. Molecular dynamics simulations on nucleic acid systems using the Cornell et al force field and particle mesh Ewald electrostatics. In Molecular Modeling of Nucleic Acids. American Chemical Society, Washington, DC. 285303.
Cheatham III, T. E., and M. A. Young. 2001. Molecular dynamics simulation of nucleic acids: successes, limitations, and promise. Biopolymers. 56:232256.[CrossRef]
Cornell, W. D., P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, Jr., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. 1995. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117:51795197.[CrossRef]
Dickerson, R. E. 1983. The DNA helix and how it is read. Sci. Am. 249:94111.
Dickerson, R. E., and T. K. Chiu. 1997. Helix bending as a factor in protein/DNA recognition. Biopolymers. 44:361403.[CrossRef][Medline]
Dickerson, R. E., and H. R. Drew. 1981. Structure of a B DNA dodecamer II. Influence of base sequence on helix structure. J. Mol. Biol. 149:761786.[CrossRef][Medline]
Dickerson, R. E., D. S. Goodsell, M. L. Kopka, and P. E. Pjura. 1987. The effect of crystal packing on oligonucleotide double helix structure. J. Biomol. Struct. Dyn. 5:557579.[Medline]
DiGabriele, A. D., M. R. Sanderson, and T. A. Steitz. 1989. Crystal lattice packing is important in determining the bend of a DNA dodecamer containing an adenine tract. Proc. Natl. Acad. Sci. USA. 86:18161820.
Drew, H. R., and R. E. Dickerson. 1981. Structure of a B-DNA dodecamer III. Geometry of hydration. J. Mol. Biol. 151:535556.[CrossRef][Medline]
Drew, H. R., R. M. Wing, T. Takano, C. Broka, S. Tanaka, K. Itikura, and R. E. Dickerson. 1981. Structure of a B DNA dodecamer I: conformation and dynamics. Proc. Natl. Acad. Sci. USA. 78:21792183.
El Hassan, M. A., and C. R. Calladine. 1995. The assessment of the geometry of dinucleotide steps in double-helical DNA; a new local calculation scheme. J. Mol. Biol. 251:648664.[CrossRef][Medline]