Originally published as Biophys J. BioFAST on September 16, 2005.
doi:10.1529/biophysj.105.067397
Biophysical Journal 89:3721-3740 (2005)
© 2005 The Biophysical Society
Molecular Dynamics Simulations of the 136 Unique Tetranucleotide Sequences of DNA Oligonucleotides. II: Sequence Context Effects on the Dynamical Structures of the 10 Unique Dinucleotide Steps
Surjit B. Dixit *,
David L. Beveridge *,
David A. Case
,
Thomas E. Cheatham, 3rd
,
Emmanuel Giudice
¶,
Filip Lankas ||,
Richard Lavery
,
John H. Maddocks ||,
Roman Osman ¶,
Heinz Sklenar **,
Kelly M. Thayer * and
Péter Varnai
* Chemistry Department and Molecular Biophysics Program, Wesleyan University, Middletown, Connecticut 06459;
Department of Molecular Biology, TPC15, The Scripps Research Institute, La Jolla, California 92037;
Departments of Medicinal Chemistry and of Pharmaceutics and Pharmaceutical Chemistry, University of Utah, Salt Lake City, Utah 84112-5820;
Laboratoire de Biochimie Theorique, Institut de Biologie PhysicoChimique, Paris 75005, France; ¶ Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029; || Institute of Mathematics B, Swiss Federal Institute of Technology, CH 1015 Lausanne, Switzerland; and ** Theoretical Biophysics Group, Max Delbrück Center, D-13122 Berlin, Germany
Correspondence: Address reprint requests to David L. Beveridge, E-mail: dbeveridge{at}wesleyan.edu.
 |
ABSTRACT
|
|---|
Molecular dynamics (MD) simulations including water and counterions on B-DNA oligomers containing all 136 unique tetranucleotide basepair steps are reported. The objective is to obtain the calculated dynamical structure for at least two copies of each case, use the results to examine issues with regard to convergence and dynamical stability of MD on DNA, and determine the significance of sequence context effects on all unique dinucleotide steps. This information is essential to understand sequence effects on DNA structure and has implications on diverse problems in the structural biology of DNA. Calculations were carried out on the 136 cases embedded in 39 DNA oligomers with repeating tetranucleotide sequences, capped on both ends by GC pairs and each having a total length of 15 nucleotide pairs. All simulations were carried out using a well-defined state-of-the-art MD protocol, the AMBER suite of programs, and the parm94 force field. In a previous article (Beveridge et al. 2004. Biophysical Journal. 87:37993813), the research design, details of the simulation protocol, and informatics issues were described. Preliminary results from 15 ns MD trajectories were presented for the d(CpG) step in all 10 unique sequence contexts. The results indicated the sequence context effects to be small for this step, but revealed that MD on DNA at this length of trajectory is subject to surprisingly persistent cooperative transitions of the sugar-phosphate backbone torsion angles
and
. In this article, we report detailed analysis of the entire trajectory database and occurrence of various conformational substates and its impact on studies of context effects. The analysis reveals a possible direct correspondence between the sequence-dependent dynamical tendencies of DNA structure and the tendency to undergo transitions that "trap" them in nonstandard conformational substates. The difference in mean of the observed basepair step helicoidal parameter distribution with different flanking sequence sometimes differs by as much as one standard deviation, indicating that the extent of sequence effects could be significant. The observations reveal that the impact of a flexible dinucleotide such as CpG could extend beyond the immediate basepair neighbors. The results in general provide new insight into MD on DNA and the sequence-dependent dynamical structural characteristics of DNA.
 |
INTRODUCTION
|
|---|
Basepair sequence effects on structure and dynamics are a key issue in understanding the biochemistry and biology of DNA at the molecular level. Most information on sequence effects to date has been limited to the 10 unique dinucleotide steps. However, recent, more extensive considerations of the problem indicate that dinucleotide steps are sensitive to at least nearest neighbor sequence context. The minimum structural unit which reveals nearest neighbor sequence context effects is the tetranucleotide step, of which there are 136 unique sequence permutations. At present, the experimental structural database of DNA tetranucleotide steps at atomic resolution, derived primarily from x-ray crystallography and emerging results from NMR spectroscopy, is quite sparse. However, the ability to model DNA structure in solution using all-atom molecular dynamics (MD) simulations has improved significantly in recent years (1
6
), and the study of sequence and sequence context effects has now become accessible to simulations carried out on high performance computers.
This series of articles describes a project aimed at obtaining MD trajectories including water and counterions for all unique tetranucleotide base sequences. This project involves the participation of nine independent research laboratories that initiated this project at a Workshop in Ascona, Switzerland, in June of 2002, referred to as the "Ascona B-DNA Consortium" (ABC). Overall, we seek to obtain MD trajectories for the 136 unique DNA tetranucleotides embedded in 39 DNA oligomers having repeating sequences. The oligomers are each 15 nucleotide pairs in length and are capped on both ends by GC pairs. All MD simulations were performed with a consensus protocol using the AMBER suite of programs (7
) and the parm94 force field of Cornell et al. (8
). This force field, although not the only option, has been verified in test cases to produce good overall agreement between calculated and observed DNA structures in crystals and in solution (9
,10
). MD trajectories of 15 nanoseconds (ns) have been obtained for each of the 39 oligomers. In Work I of this series (11
), we presented the research design, MD protocol, convergence and stability, and informatics considerations, and reported results on sequence context effects in d(CpG) steps. In this work, we provide results from the structural analysis of all the 136 unique tetranucleotides.
Background
The general background necessary to this research was presented in some detail in Work I. We present here only a concise summary of salient information together with references to published work in the field of MD on DNA that has appeared in the interim. The initial motivation for this study was the investigation of first neighbor context effects on the structures of DNA dinucleotide steps, which requires knowledge of the structures of all 136 unique tetranucleotides. Experimental oligonucleotide structures from crystallography or NMR spectroscopy at the tetranucleotide step level are available for only a limited number of specific cases. Even so, surveys of these structures have raised the possibility of significant sequence effects (12
14
). An extensive theoretical consideration of the problem to date is due to Packer et al. (15
,16
), who presented detailed considerations based on the minimization of stacking energies for tetranucleotide steps as described by empirical energy functions.
New NMR experiments based on residual dipolar coupling (RDC) offer the possibility of obtaining higher resolution structures of oligonucleotides in solution (17
) and may have sufficiently high resolution to accurately resolve DNA fine structure. Presently, NMR/RDC structures of DNA oligonucleotides are just beginning to appear in the literature (18
20
). MD simulations on each of these sequences have been carried out and are found to be generally in close accord with NMR-derived solution structures (9,21
). In the case of dodecamers containing the dA6 motif, independent MD in solution were carried out starting from the x-ray crystal structure and the NMR solution structure and canonical B-form DNA (21
). The results converged rapidly to a structure in close proximity to the observed NMR solution structure. The current ideas on sequence-dependent bending and curvature of B-DNA have been recently reviewed by Beveridge et al. (22
) and Zhurkin et al. (23
).
Recent surveys of the field of MD on DNA are available from several sources (2
6
,24
). The AMBER parm94 (8
) is a "second generation" parameterization of the nucleic acids force field for MD using explicit solvent models for proper treatment of electrostatics. MD using AMBER and parm94 provided the first well-behaved MD trajectories of the DNA double helix (6
,25
28
). Known shortcomings in parm94 still include a sensitive problem in the coupling of base-sugar torsions and a systematic tendency toward somewhat underwound structures. A modification known as parm99 has recently been proposed (29
) which improves twist but appears less sensitive to changes in the environment (high salt, ethanol), leading the ABC group to use the parm94 force field, well characterized with respect to experimental data on prototype cases (9
,30
). Leading references to force field alternatives are provided in Work I. A new version on nucleic acids force field for GROMOS (31
) as well as CHARMM (32
) has recently appeared, but extensive force field comparisons are beyond the scope of this study.
Updating the literature on studies of sequence effects on DNA deformability since Work I of this series, Matsumoto and Olson (33
) reported normal mode analysis of oligonucleotide DNA using knowledge-based potentials obtained from high-resolution crystal structures. The results successfully accounted for the bending persistence length and stretching modulus of DNA and indicated a sensitivity of twisting force constants to the basepair sequence. An MD study of two 18-basepair DNA oligomers was recently reported by Lankas et al. (34
). In these two sequences, all 10 unique dinucleotide basepair steps are represented, which provides a point of comparison with some of the results of this study. A marked trend in relative flexibility in roll, pyrimidine(Y)-Purine(R) > purine-purine > purine-pyrimidine was noted in the study, and the YpR steps were also found to be the most flexible in tilt and partially in twist, supporting previous results (35
). Slide-rise, twist-roll, and twist-slide elastic couplings of various degrees were observed. A possible correlation of motions on a length scale of 23 basepairs was noted, which falls in the neighborhood of first neighbor context effects. A set of basepair step sequence-dependent bending force constants was recently obtained from electron paramagnetic resonance studies by Okonogi et al. (36
). Ho and co-workers (37
) are assembling a crystallographic data set of DNA structures involving all permutations of the inverted repeat sequence d(CCnnnN6N7N8GG) where N6, N7, and N8 are any of the four naturally occurring nucleotides and the ns are the corresponding bases to maintain self-complementarity. The presented data based on 29 of the possible 64 permutations of the trinucleotides correlate sequence and environment with the B, A, and Holliday junction-like structural classes and their variability.
An issue of particular interest in MD on DNA is the motion of mobile counterions, which may also contribute interesting sequence effects (38
40
) and have been noted from previous studies to be slow to converge (30
). Varnai and Zakrzewska (41
) performed MD simulations on d(CCCATGCGCTGAC) and studied the behavior of mobile counterions Na+ and K+. The ions, as expected, preferentially sampled electronegative sites around the DNA, but direct ion association with nucleotide bases occurred in <13% of the trajectory. Interesting ion- and sequence-specific effects were observed in which preferential direct binding of Na+ ions occurred at a minor groove site, whereas the larger K+ ions favored a site in the major grove. This introduces a degree of complexity not apparent from just examining the electrostatic potential of DNA (42
). Little evidence of minor groove narrowing correlated with ion binding was observed, a topic around which there has been a diversity of opinion (38
40
).
Extended studies on the d(CGCGAATTCGCG) sequence (43
) indicate that DNA conformational and helicoidal parameters including groove widths have relaxation times of
500 ps or less. The rule of thumb is to sample 10 times the relaxation time of all the indices of interest for a particular application (44
). This indicates that 5 ns trajectories should be sufficient in the absence of substate problems (see below), and we are well in excess of that in the 15 ns trajectories carried out in phase I of this project. Observed diffusion constants indicate that motions of mobile counterions in the environment of DNA will be relatively slow to converge. Ponomarev et al. (43
) reported a benchmark indicating that ion occupancies can take up to 100 ns to stabilize. However, in the same calculation, the DNA parameters were found to be well stabilized at 5 ns and not sensitive to the fine details of ion convergence. The calculated DNA counterion radial distribution functions were found to be essentially unchanged after 35 ns, indicating that mean field effects of ions are dominant in DNA structure and that the excess sampling to get ion occupancies converged is a matter of granularity of the ion distributions.
DNA has the potential for contributions from manifold thermally accessible substates (45
,46
). Known examples of this are the BI-BII transitions (47
),
/
crankshaft motions (48
), and YpR hinge motions (49
). The last have been noted to play an important role in structures of protein-bound DNA (13
) as well as DNA curvature (22
). Rich and co-workers (50
) have observed a correlated
/
transition in A-form DNA from the preferred g/g+ state, which they called AI, to a less common and less constricted t/t state they labeled AII. Sundaralingam and co-workers (51
) have noted that distortions in the
/
on the 5'-side of the sugar are more common in A-DNA, whereas conformational changes in the
/
on the 3'-side are more common in the B-form DNA. Indications from the crystallographic database and MD are that certain basepair steps show high flexibility, whereas those involved in A-tracts are relatively rigid (35
,52
54
). This raises the question of which are more susceptible to sequence context effects, rigid or flexible steps. One could argue either way since more rigid steps could either resist deformation or respond as a unit whereas flexible steps are more malleable but could absorb perturbations more easily. The problem this poses to a simulation arises from the need to sample all thermally accessible substates adequately to obtain an ensemble of snapshots which properly represent the dynamical structure of the DNA.
The d(CpG) step in all its possible neighboring sequence contexts was chosen for preliminary analysis as described in Work I, since x-ray structures indicate that this and possibly other YpR steps have a potential for context-dependent substates (49
,52
). The results were surprising in several respects. First, although many structural and dynamic features of the oligomers studied have converged to stable values, the results indicate that slow backbone transitions prevent a complete sampling of the conformation space of B-DNA in the MD on CpG steps. For the same reason it is not yet possible to characterize all the consequences of such backbone transitions, which can occur independently or be coupled together, and which can influence the structural and dynamic behavior beyond the junction where the transition occurs. If we filter out such effects, the remaining conformational sampling appears to be reasonably balanced but also suggests that the surrounding sequence has a very small effect on the properties of the CpG step. This indicates that any difference in the underlying potential as a consequence of the surrounding sequence is probably only a fraction of a kcal/mol.
The preliminary analysis obtained in Work I for the dCpG step anticipates at least some of the problems involved and issues to be considered. However, before drawing any general conclusions, it is clearly necessary to complete the analysis of all 136 unique tetranucleotides. At this point all simulations from the initial phase of ABC are completed and analyzed. The data obtained will hopefully allow us to obtain an increasingly clear view of sequence context effects, to better understand the importance of such phenomena as conformational substates, and also to define how end effects and length effects can influence the behavior of DNA fragments.
 |
METHODOLOGY
|
|---|
All simulations have been carried out using the AMBER 6 or AMBER 7 suite of programs (7
) and the parm94 force field (8
). The simulations cover 39 double-stranded DNA oligomers, each being 15 basepairs in length. The sequences of these oligomers are discussed below. A consensus protocol was adopted for simulation in which the solute molecule is a 15 basepair oligonucleotide with 28 potassium ions added to achieve system electroneutrality. The DNA with its counterions was simulated in a truncated octahedral box having a face-to-face dimension of
70 Å, which allows for a solvent shell extending for at least 10 Å around the DNA. The starting configuration has the oligomer in a canonical B-form. The ions are randomly placed around the oligomer and located at least 5 Å from any atom of the solute and at least 3.5 Å from one another in the initial structure. Ion interactions with other atoms are based on the potentials developed by Aqvist (55
). The neutral ion-oligomer complex was solvated with TIP3P water molecules (56
). Simulations are performed with periodic boundary conditions in which the central cell contains
8000 water molecules. Considering the DNA, counterions, and solvent water, the total system consists of
24,000 atoms.
The preparations for MD simulations consist of an initial minimization followed by slow heating to 300 K at constant volume over a period of 100 ps using harmonic restraints of 25 kcal mol1 Å2 on the solute atoms. These restraints are slowly relaxed from 5 to 1 kcal mol1 Å2 during a series of five segments of 1000 steps of energy minimization and 50 ps equilibration using constant temperature (300 K) and pressure (1 bar) conditions via the Berendsen algorithm (57
) with a coupling constant of 0.2 ps for both parameters. The final segment consists of 50 ps equilibration with a restraint of 0.5 kcal mol1 Å2 and 50 ps unrestrained equilibration. The simulations were then continued for a total of 15 ns at constant temperature and pressure conditions, using the Berendsen algorithm (57
) with a coupling constant of 5 ps for both parameters. Electrostatic interactions were treated using the Particle Mesh Ewald (PME) algorithm (58
) with a real space cutoff of 9 Å, cubic B-spline interpolation onto the charge grid with a spacing of
1 Å. SHAKE constraints (59
) were applied to all bonds involving hydrogen atoms. The integration time step was 2 fs. Center of mass translational motion was removed every 5000 MD steps to avoid the methodological problems described by Harvey et al. (60
). The trajectories were extended, as noted above, to 15 ns for each oligomer, and conformations of the system were saved every 1 ps for further analysis.
Rather than performing separate calculations on all 136 tetranucleotides using 136 different oligomers (for example, placing each tetranucleotide within a longer duplex surrounded with some standard sequence), we carried out the calculations on oligomers with repeating tetranucleotide sequences (ABCDABCDABCD...). Moving a 4-base "reading frame" along the oligomer, we locate successively ABCD, BCDA, CDAB, and DABC tetranucleotides. The length of the oligomers was chosen to be 15 basepairs, a compromise between the necessity to avoid end effects and the computational expense of the simulations. This strategy enables all 136 tetranucleotides to be studied using only 39 oligomers. We cap the ends of each oligomer with a single GC pair to avoid fraying. This implies that a given 15 basepair oligomer contains 3
tetranucleotide repeats 5'-G-D-ABCD-ABCD-ABCD-G-3', where A,B,C,D are any deoxyribonucleotide. This choice means that if we decide to ignore two basepairs at either end of the oligomer, to avoid potential artifacts from end effects, there will still be two distinct copies of each unique tetranucleotide (ABCD, BCDA, CDAB, DABC) within the remaining 11 basepair fragment. MD trajectories for these 39 oligonucleotides provide a basis for comparing the properties of two copies of each tetranucleotide. Note this is valuable for the study of convergence as well as sequence context effects. The backbone conformational angles and helicoidal parameters of the DNA structure in the MD trajectory were calculated using the program Curves 5.3 (61
) and stored in our relational database management system to facilitate mining of this voluminous dataset.
Many questions, including those of interest to this project, involve comparing the results of two chosen MD simulations, or, one chosen simulation with all the others. In the relatively brief history of MD on DNA, the primary tool for this task has been the root mean-square difference (RMSD) between structures or between derived parameters from structures following optimal alignment. In MD simulation, one obtains, in any given trajectory, an ensemble of structural "snapshots", i.e., the dynamical structure. Previous studies have computed the average structure from this ensemble, calculated after placing a representative number of snapshots in optimal alignment followed by a few cycles of post facto energy minimization which ensures that the average structure assumes a physically reasonable form. Typically the time evolution of RMSD is obtained by calculating the RMSD between each of the MD snapshots and the computed average structure. However, an MD average structure can be misleading when the dynamical structure from MD involves substates. Furthermore, the snapshots which comprise the dynamical ensemble of the DNA from MD are typically 12 Å RMSD from the average structure. However, none of the snapshots actually match average structure. This naturally raises a question about the suitability of average structures at all in MD analysis.
In response, a method for comparing MD results has been applied which avoids the use of MD average structures and makes comparisons only on the basis of actual snapshots included in the MD ensemble (S. B. Dixit, S. Ponomarev, K. M. Thayer, and D. L. Beveridge, unpublished). Comparing the results of the dynamical structure from any two MD simulations, the first step is to generate the matrix of RMSD differences for all n structures, where n is the number of MD snapshots considered. In previous works this has been referred to as a two-dimensional (2D) RMSD plot (46
). The characteristics of a 2D RMSD plot are interesting per se in the identification of substates (46
,62
). However, our primary use of this information in this project comes in the generation of a plot of the probability of observing a given RMSD between all snapshots in both simulations, the RMSD probability denoted as P(rmsd). It is of interest to distinguish two cases at this point: a), the Pintra (rmsd) in which the RMSD of all structures with all other structures in a given trajectory are displayed to ascertain the extent of thermal motions, and b), the Pinter (rmsd) in which the structures from one distribution are compared with those of another. The question of whether the results of the two MD simulations are similar or not in RMSD probability analysis reduces to comparing the Pintra (rmsd) and Pinter (rmsd) distributions. For two simulations in which the P(rmsd) results are identical, these should be the same.
In this study we compare the probability distributions of angular RMS deviations calculated for the backbone dihedral angles (
, ß,
,
,
) involved in connecting consecutive nucleotides, the phase and amplitude of the sugar pucker, and the torsional angle
connecting the sugar and the base in the tetranucleotide, with reference to every other conformation adopted by that tetranucleotide in the trajectory and also the conformations adopted by the second occurrence of the same tetranucleotide sequence in the database. The use of angular (internal) coordinates for the RMSD calculation instead of the usual Cartesian coordinates results in the use of a smaller number of variables to define the structure of a section of the DNA and also avoids the problem associated with fitting of structures to a reference frame before an RMSD calculation is performed in Cartesian space. The results in this article employ the backbone conformational and basepair helicoidal parameters of DNA as defined by Dickerson et al. (63
,64
) and implemented in the Curves program (65
). For a recent article dealing with the derivation of DNA structural parameters, see Lu and Olson (66
).
When two P(rmsd) results differ, one may compare the two distributions using statistical tests to determine the confidence level with which one may infer the two sets of structures to have been drawn from the same general population. The standard statistical test for the similarity in such situations is the
2 test for independence (67
), which can be readily applied. An alternative, more rigorous information theoretic approach applicable in the case of complex distributions is to calculate the "Kullback-Leibler (KL) Distance", DKL, which is a measure of the divergence between a "true" probability distribution, p, and a "target" probability distribution, q (68
). For discrete probability distributions, p = {p1, ..., pn} and q = {q1, ..., qn}, DKL is defined as
 |
For continuous probability densities, the sum is replaced by an integral. The value DKL is always positive and equal to zero only if pi = qi. DKL is not, in general, symmetric and hence we employ the mean of DKL(p,q) and DKL(q,p). This equation based on expected log likelihood ratio between the two distributions is a metric of the relative entropies and can be viewed as the bits of information required to convert one distribution to another. Such an approach to compare the RMSD probability distribution provides a single index for examining the difference between two MD results in a way that avoids the necessity of working with possibly problematic average structures.
 |
RESULTS
|
|---|
The completed data set in this project contains the results of 39 independent 15 ns MD trajectories on DNA 15-mers of various sequence composition, with each of the 136 unique tetranucleotide steps represented at least twice. The complete 15 ns of the postequilibrated trajectory are included in the analysis presented here. The data set contains almost 600,000 coordinate sets. All the trajectories are globally very stable over the complete simulation length and the mass-weighted all-atom RMSD with reference to the simulation average is in the range of 24 Å. The A-rich sequences favor a more B-like form in solution, whereas the G-rich sequences present a tendency toward (but not identical to) canonical A-like structure. The average mass-weighted all-atom RMSD of the 39 DNA trajectories with respect to the canonical B-form is
4.8 Å and
4.9 Å with respect to the canonical A-form DNA. The poly(A) sequence at an RMSD of 3.7 Å with respect to the canonical B-form structure is the most B-like, whereas poly(G) is the farthest from the canonical B-form structure with an RMSD of 6.2 Å and
4.6 Å from the A-from structure. Note that the RMSD between the canonical A and B forms of DNA for a 15-mer DNA sequence is itself
7 Å. The differences in the A- and B-"like" structures in the MD model are largely observed as a combination of basepair inclination, x-displacement, roll, and helical twist. There are no clear cut transitions to the C3'-endo (north) conformation of the sugar pucker which would be affirmative of transitions between the B and A forms. The solution state structures are not exactly the same as the canonical models of DNA because the atomistic models provide greater fine structural details of the system. The occurrence of such sequence-dependent intermediate structures outside the regime of canonical A or B form has also been reported in crystallography (69
).
In Work I, we presented preliminary results on the dCpG dinucleotide step in all sequence contexts. Our analysis revealed that in certain cases, conformational transitions to nonstandard B-form conformational states occurred. Two types of these conformational transitions were prominent: a), BI/BII transitions (47
), which are reversible within the nanosecond timescale, resulting from coupled changes in the
and
values, and b),
/
flips (48
), in which the nonstandard form persisted to an extent that raised a concern about whether or not a sufficient sampling of the conformational space of B-DNA was being achieved. Thus, in the analysis of the complete database, we must address first and foremost the extent to which such long-lived nonstandard substates cause a sampling problem.
Conformational substates of DNA backbone
In the canonical B-form DNA obtained from fiber diffraction, the
/
angles are
314°/36° (i.e., g/g+), whereas during MD, noncanonical substates with
/
values around g+/t are observed. Transitions between the BI and BII states are observed when the value of
/
changes from t/g with (
) value around 90° to g/t with (
) value around +90°. On the basis of distinct combinations of
,
, and (
) values adopted, in accordance with the simple classification presented in Table 1, we were able to organize all the DNA backbone conformations in our database into seven putative substates. (For brevity, we refer to these "backbone conformational substates" as just "substates" in the rest of the article.) Similar classes of backbone angles were observed in the work of Varnai et al. (48
) in which they explored the free energy surface of the central GpC dinucleotide step in the d(GTCAGCGCATGG) sequence. Fig. 1 shows the probability distribution as a function of
/
/(
) values for all the backbone positions in the complete database. This plot is based on a total of 11,700,000 data points, corresponding to the product of 39 DNA trajectory x 10 nucleotide positions (chosen to avoid end effects) x 2 strands x 15,000 snapshots (i.e., sampling structures every picosecond). Note that although the dihedral angles have usually been classified in terms of their values being close to g+/g/t etc. in the past, we have simply classified the data in terms of the clusters observed in the 3D plot in Fig. 1. States such as 3 and 5 present a range of values that spans across both the g+ and t in case of
, whereas the value of
is essentially near t. A 2D plot presenting the classification in terms of just the
and
angles is available in the supplementary material (Supplement 1). General approaches to a consistent identification of the number of sub- or metastable states present in a given time series are discussed in I. Horenko, E. Dittmer, F. Lankas, J. Maddox, P. Metzner, and C. Schuette (unpublished), including the example of an analysis of a 100 ns trajectory of one of the ABC oligomers described here.