| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

* Department of Chemistry & Chemical Biology, Rutgers, the State University of New Jersey, Wright-Rieman Laboratories, Piscataway, New Jersey; and
Center for Complex Molecular Systems and Biomolecules, Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Prague, Czech Republic
Correspondence: Address reprint requests to Wilma K. Olson, Tel.: 732-445-3993; Fax: 732-445-5958; E-mail: olson{at}rutchem.rutgers.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The hydrogen-bond donor and acceptor atoms that line the grooves of the DNA double helix serve as recognition elements for interactions with proteins, drugs, and solvent. Water molecules form a distinctive spine of hydration in the minor grooves of numerous B-DNA structures (Kopka et al., 1983
) and ordered networks of fused polygons in the major grooves of many A-DNA structures (Shakked et al., 1981
). Moreover, the minor-groove spine of associated water molecules in AT-rich duplexes can be displaced by small, positively charged, crescent-shaped molecules, such as the antibiotic netropsin (Kopka et al., 1985a
,b
), with proton donor and acceptor atoms arranged to mimic the crystallographically observed configurations of bound waters. The positive charges on the drug molecules and the cationic amino-acid side groups on the proteins are thought to facilitate ligand access past the negatively charged sugar-phosphate backbone.
Some small molecules have capabilities of recognizing short DNA sequences via a code that complements the chemical information on the minor-groove edges of the basepairs (Trauger et al., 1996
; White et al., 1998
; Dervan and Burli, 1999
; Wemmer, 2001
). In contrast to other binding ligands, which form 1:1 complexes in the minor groove and possess only partial sequence-reading capabilities, the polyamide molecules designed to recognize specific basepair sequences form 2:1 complexes with DNA. The DNA sequence-reading abilities of these so-called lexitropsins (Goodsell, 2001
; Wemmer, 2001
) are realized by the combination of three ring subunitsimidazole (Im), pyrrole (Py), and hydroxypyrrole (Hp). A pair of pyrrole residues is used to discriminate A·T or T·A from G·C and C·G (Pelton and Wemmer, 1989
; White et al., 1996
), an Im·Py pair to differentiate G·C from C·G, and both G·C and C·G from A·T and T·A (Trauger et al., 1996
; White et al., 1997
), and a pair of Hp and Py units to distinguish T·A from A·T, and both T·A and A·T from G·C and C·G (White et al., 1998
). The discrimination mechanisms are thought to reflect both steric hindrance and the asymmetric distributions of hydrogen-bond donor and acceptor groups on the minor-groove edges of the basepairs (Goodsell, 2001
; Wemmer, 2001
). Specifically, the Py·Py pair is expected to clash with the exocyclic amino group of guanine when brought into the vicinity of a G·C or C·G basepair, thereby favoring its association with A·T or T·A. The Hp·Py pair is believed to use a similar steric mechanism to discriminate against G·C and C·G and to interact preferentially with A·T over T·A via hydrogen-bond formation (involving the exocyclic OH of Hp and the N3 atom of adenine). The Im·Py pair has the capacity to exclude C·G basepairs on the basis of steric hindrance and to associate preferentially with G·C over A·T and T·A via hydrogen bonding (between the imidazole ring nitrogen and the guanine exocyclic amino group).
One of the best sources of information about nucleic acid-ligand interactions is the database of experimental nucleic acid structures (Berman et al., 1992
), which is now at the point where there are enough data to extract the preferred positions of different ligands in close contact with the constituent bases, sugars, and phosphates. For example, water molecules cluster in distinct hydrogen-bonding sites around the bases as opposed to being evenly spread over the molecular surface (Schneider et al., 1993
, 1998
; Schneider and Berman, 1995
). Moreover, the bound solvent clusters serve as recognition motifs for specific interactions of DNA with proteins, drugs, and other ligands (Woda et al., 1998
; Howerton et al., 2001
; Moravek et al., 2002
).
These findings have stimulated our interest in developing a concise, more quantitative description of the ligand-binding sites around DNA and RNA. To make use of the observed hydration sites in ligand-docking calculations, an effective mathematical framework must be constructed for precise description of the sites of intermolecular association. The approach taken here follows that of Olson et al. (1998)
, who derived a set of elastic functions that reflect the sequence-dependent bending, twisting, and stretching of nucleic acid basepair steps. This class of ellipsoidal expressions can also be used to characterize the distributions of water and other ligands around the chemical components of DNA or RNA. Once the binding functions are defined, the interactions of drugs and proteins with DNA can be converted to knowledge-based energies, i.e., statistical scores, for molecular docking applications.
In this article, we first determine a set of elastic functions at the hydration and protein binding sites of the Watson-Crick basepairs with three different approaches: a previously described Fourier averaging of the binding patterns of ligands around individual bases (Schneider et al., 1993
), here termed local densities; a similar analysis of ligands in longer stretches of DNA yielding global densities (Schneider et al., 1993
); and a statistical clustering algorithm combined with principal component analysis. The resulting ligand-scoring functions are then used to compare the binding of various small molecules in the B-DNA minor groove with the known sites of bound water in well-resolved crystal structures. We consider a series of 2:1 drug-DNA complexes with sequence-recognition capabilities as well as minor-groove binders that form 1:1 complexes with DNA. The ligands are assigned energy scores based on the positioning of the hydrogen-bonding sites on the drugs, relative to the preferred locations of water around the DNA bases. The knowledge-based functions are also used in a series of sequence substitution studies to test the hypotheses that underlie the drug design, e.g., the relative contribution of steric or hydrogen-bonding factors to ligand-binding preferences. The energies of incorrect sequences are obtained by superimposing different bases (without conformational adjustment) on the observed side groups in high resolution DNA-polyamide structures and measuring the relative positions of the hydration sites of the modified duplexes with respect to the unmodified ligand positions. The binding scores of computationally "synthesized" drug-DNA complexes are also compared with the known DNA-binding affinities of polyamide hairpin molecules in solution. The structures of the computer-generated species are checked against those of related hairpin molecules bound to nucleosomal DNA. The observed spatial positioning of the polyamide ligands with respect to sequence-specific DNA targets on the surface of the nucleosome core particle is assessed with the hydration density functions.
| METHODS |
|---|
|
|
|---|
|
olson/ligands.html.)
Global pseudoelectron density functions
Whereas the above local densities are based on the distribution patterns of bound ligand atoms around one of the four bases, the global ligand-binding functions are generated at the level of overall structure (Schneider and Berman, 1995
; Schneider et al., 1993
). The discrete water or protein contact sites compiled for the individual bases are superimposed on the global structure, e.g., a basepair or a set of bases, and Fourier averaging of the superimposed points is performed in the global reference frame.
Quantitative description of interactions in the major or minor groove is based on the superposition of ligand-binding sites in the groove of interest. Because the global density refinement typically fails to generate satisfactory ellipsoids in the major groove (see Discussion), this approach has only been used to generate minor-groove ellipsoids, and to study minor-groove ligand-DNA interactions.
Local clustering
The ligand-binding data have also been analyzed with a hierarchical, agglomerative clustering algorithm (Auf der Heyde, 1990
). Each observation is initially treated as an individual cluster, and the clusters are merged one-by-one according to their distances of separation. Four ways of calculating the cluster distances are considered: single linkage; complete linkage; average linkage; and centroid linkage. The complete and centroid linkage distances are used to cluster ligand positions in the minor groove. These choices are empirical, based on the quality of the clusters generated with the different methods.
To reduce noise, only clusters with >5% of the total number of positions are characterized by elastic functions. The formulation of the energy expression is based on principal component and factor analysis of the clustered data (Auf der Heyde, 1990
). The three orthogonal axes used to specify the positions of associated waters or amino acid atoms with respect to a given base are transformed into three new axes, so that these new axes coincide with the directions of maximum variance. Thus, each cluster is represented by an ellipsoid using the new axes (eigenvectors) as the ellipsoidal axes, and the variance along these axes (square-roots of the eigenvalues) as the axis lengths. The force constant matrix F used to calculate the energy of a given ligand-binding site (see Eq. 1 below) is the inverse of the covariance matrix. The mathematical protocol for obtaining this set of potentials is described in full at http://rutchem.rutgers.edu/
olson/ligands.html.
Global clustering
The global clustering of observed DNA contacts is similar to the global density refinement in using the water molecules around all the bases in a fragment of double-helical structure to predict ligand-binding sites. Standard bases are overlapped on the real bases in a given structure, and the water distributions around the standard bases are automatically converted to the global reference frame. Ellipsoidal functions are then generated in the global reference frame with the same clustering procedure used at the local level.
Global clustering, however, is inferior to the preceding three methods of generating ligand-binding ellipsoids in several respects:
Selection of drug-DNA structures
The knowledge-based potentials have been tested against 18 well-resolved (
2.4 Å resolution) oligonucleotide duplex structures with one or more drug molecules positioned in the minor groove (Table 1). Among the test structures are seven drug-DNA complexes (Kopka et al., 1997
; Kielkopf et al., 1998a
,b
, 2000
; Mitra et al., 1999
) with 2:1 binding stoichiometry, including five structures with polyamide ligands designed to bind the minor-groove edges of basepairs (bdd002, bdd003, gdj057, dd0020, and dd0021) (Kielkopf et al., 1998a
,b
, 2000
). These five drugs, which incorporate DNA sequence-reading capabilities, are analyzed in more detail than the two remaining structures, gdh060 (Mitra et al., 1999
) and gdj054 (Kopka et al., 1997
), with 2:1 binding stoichiometry and similar chemical makeup but with limited DNA sequence-reading capabilities. The distamycin ligand in the former complex is made up of a string of pyrrole (Py) rings with the capacity only to differentiate A·T and T·A from G·C and C·G basepairs. The pairs of imidazole (Im) rings that constitute the diimidazole lexitropsin bound to DNA in gdj054 cannot distinguish any differences among basepairs, since the Im·Im pair has equal affinity for all four Watson-Crick interactions (Wemmer, 2001
). Four other drug-DNA complexes with 2:1 binding stoichiometrygdhb25, gdlb49, gdlb50, gdlb51 (Chen et al., 1994
, 1997
)are excluded from the test set because they contain modified (hypoxanthine) bases. The remaining 11 ligands (Coll et al., 1987
, 1989
; Larson et al., 1989
; Sriram et al., 1992
; Balendiran et al., 1995
; Goodsell et al., 1995
; Wood et al., 1995
; Clark et al., 1996a
, 1996b
, 1997
; Aymami et al., 1999
) form 1:1 drug-oligonucleotide complexes. These complexes are selected from all 1:1 drug-DNA complexes on the basis of the relatively large number of drug atoms in each structure (
3) potentially involved in intermolecular hydrogen bonding with DNA. Such interactions are expected to play an important role in sequence-specific DNA-binding interactions. The (2.32.65 Å resolution) structures of three polyamide-DNA complexes with ligands designed to target specific sequences on the surface of the nucleosome core particle (pd0328, pd0329, and pd0330) (Suto et al., 2003
) are also examined. The drugs in the latter complexes are covalently connected by a peptide linker, whereas those associated with the oligonucleotide duplexes are chemically independent species.
|
O2(T) hydrogen bond (Leonard et al., 1995
Interaction score
The total energy, ETot, of the drug-DNA complex is calculated as the sum of interaction energies E for all critical drug atom-DNA ellipsoid pairs. The interaction energy of a given hydrogen-bond donor or acceptor atom on a drug at position Xa = (xa, ya, za), with respect to a preferred DNA ligand-binding site centered at Xe = (xe, ye, ze), is approximated by the harmonic energy expression
![]() | (1) |
The force constants and ligand-binding sites used in the analysis of minor-groove binding are based on the positions of bound water molecules in well-resolved B-DNA structures. The knowledge-based binding potentials derived from the waters in A-DNA structures are not considered since all known DNA complexes with minor-groove bound ligands retain the B-form. Energies derived on the basis of the hydration patterns in protein-DNA complexes are expected, because of similarities in solvent binding patterns (see Results), to resemble the reported B-DNA-based values. Here the elastic potentials, i.e., probable water binding positions, associated with each of the four bases are superimposed by least-squares fitting (Horn, 1987
) of a standard base with known hydration sites on the bases in a given drug-DNA complex (Fig. 2). Because of the symmetric properties of the ideal base reference frame (Olson et al., 2001
), it is easy to substitute one base for another in the calculations. Modifications of DNA-binding ligands are performed by analogous substitutions of standard drug fragments (see below).
|
Interaction energy ceiling
A critical drug atom may not necessarily be close to one of the DNA-binding ellipsoids. Because of the quadratic nature of the knowledge-based potentials, the energy assigned to a drug atom and its (possibly distant) DNA ellipsoidal partner could be quite large. Such behavior is inconsistent with physical modeling, where the formation of a hydrogen bond can significantly decrease the total interaction energy, but the loss of a hydrogen bond introduces no energetic penalty. An energy ceiling has therefore been introduced to limit the quadratic growth of the calculated hydrogen-bonding energies. That is, if the energy assigned to a critical drug atom is greater than some upper limit, the potential energy of the critical atom is equated to that limit.
The value of the energy ceiling is based on the interaction energies of water dimers estimated from ab initio molecular orbital calculations (Singh and Kollman, 1985
). The computed hydrogen-bonding energy in such structures is lowest (
6 kcal/mol) when the waters are separated by a distance of 2.9 Å, and approaches a value of zero when the molecules are at a distance of 78 Å, i.e., 45 Å beyond the ideal distance of separation. The principal axes of the current set of knowledge-based ellipsoids range from 0.5 Å to 2.2 Å in length (with an average value of 1.4 Å and an average variance
= 0.7 Å). An atom located 5.7
7.1
from the center of a binding ellipsoid is thus as far from a potential binding site as a pair of non-interacting waters are from their ideal hydrogen-bonding positions, i.e., 45 Å ÷ 0.7 Å/
= 5.77.1
. According to Eq. 1, the energy of a drug atom displaced from the center of a DNA-binding ellipsoid by n
along one of the three principal axes is raised to a level of n2/2. Separation distances of 5.77.1
therefore correspond to an energy ceiling of 1625. Preliminary calculations testing these two energy limits yield similar results. The data reported below are based on the higher energy ceiling.
Nomenclature of drug and base atoms in polyamide-DNA complexes
For purposes of analysis, a local numbering scheme is introduced to account, at the level of drug subunits, for the atoms comprising the polyamides complexed to DNA rather than the standard chemical nomenclature based on the structure of the ligands as a whole (Fig. 3, a and b). The drug subunit on which an atom is located is further distinguished by a residue name and number and different drugs are assigned a numerical identifier. Thus, one can easily relate a particular atom, residue, or drug in a bound polyamide-DNA complex to an atom or base on DNA. The drug residues are numbered in the same sense as the base sequence, with subunits of lower numerical value associated with the coding strand and residues with higher values bound to the complementary strand (Fig. 3 c).
|
Modification of polyamide ligands
Polyamide drug models are constructed by overlapping standard drug subunits with the polyamide ligand templates in known 2:1 crystal complexes. The Cartesian coordinates of the ring atoms of the standard subunits are determined with a downhill simplex procedure (Clowney et al., 1996
), which minimizes the difference between the internal chemical parameters (bond lengths and valence angles) of the derived ring structures and the corresponding mean values in the five polyamide complexes. The computational "synthesis" of new ligands is effected by fitting standard ring subunits on a selected polyamide drug template with a least-squares procedure (Horn, 1987
) and then connecting the exocyclic atoms on successively positioned subunits, i.e., the C2' carbonyl carbon of unit i and the N4 amide nitrogen of unit i+1. The conformation of the intervening peptide linkerthe C2C2'N4C4 torsion angle, the C2C2'N4 and C2'N4C4 valence angles, and the C2'N4 bond lengthis automatically determined by the locations of the drug subunits.
| RESULTS |
|---|
|
|
|---|
Careful examination of A-DNA and B-DNA structures reveals the reason for the difference in water positioning in the two helical forms. In A-DNA, the 5'-phosphate group of the chain backbone lies very close to the R(C8) and Y(C6) atoms. Moreover, the 5'-phosphorus atom lies roughly in the same plane as the base (Lu et al., 2000
), leaving almost no space near R(C8) or Y(C6) for a water molecule. Indeed, the O5' atom is generally in close contact with R(C8) and Y(C6) atoms in A-DNA structures, and the stabilizing contribution of CH
O5' hydrogen bonding to RNA (A-type) structure has long been appreciated (Shefter and Trueblood, 1965
; Sussman et al., 1972
; Rosenberg et al., 1973
; Wahl and Sundaralingam, 1997
). In B-DNA structures, by contrast, there are no atoms on the sugar-phosphate backbone close to R(C8) or Y(C6), and the 5'-phosphorus atom lies in a different plane from the base (Lu et al., 2000
). Thus, there is sufficient room around the R(C8) or Y(C6) atoms in B-DNA for water molecules to associate with the bases at these sites. The DNA in protein-DNA complexes is known from other analyses (A. Colasanti, X.-J. Lu, and W. K. Olson, unpublished data) to be predominantly B-form DNA. There is accordingly enough space near R(C8) or Y(C6) to hold water molecules in most protein-bound structures. The number of such contacts, however, is much smaller than the number of waters associated with other base atoms in B-DNA and protein-DNA structures. There are additional examples of close CH
O interactions in protein-DNA structures, particularly major-groove contacts to the thymine methyl groups and the C5 atoms of cytosines (Mandel-Gutfreund et al., 1998
), if the current restriction on 2.0 Å or better structural resolution is relaxed. The analysis of DNA physical characteristics with lower quality data is, however, questionable.
Whereas the distribution of water around the bases is independent of the crystal structures from which the binding sites are collected, the amino acid distribution patterns (not shown) are sensitive to the choice of protein-DNA complexes. Only a few amino acids contact DNA in each protein-DNA complex, and the closely associated atoms are usually concentrated in a small region of the structure, often in only one of the two grooves. For example, eight of the 35 close contacts of amino acid atoms to the N3 of adenine occur in the DNA bound to the yeast TATA-box protein (pdt012; see Table S3 in Supplementary Material for a list of the specific contacts to DNA in the structures considered here).
Ligand-binding potentials based on pseudoelectron densities
Contour maps of the elastic energy functions obtained by pseudoelectron density analysis of the water molecules and amino acid atoms in contact with the bases of B-DNA and protein-DNA structures are illustrated in Fig. 1, b and d. The ellipsoidal contours (depicted at a level corresponding to twice the variances along the principal axes) are determined separately for the individual bases, but pictured for the composite A·T pair. As is clear from this example, the water ellipsoids are very similar for a given base in different types of DNA structures (also see Table S4 in Supplementary Material). The centers of corresponding ellipsoids are close, in agreement with well-known restrictions on the hydrogen bonding of electronegative atoms (Llamas-Saiz and Foces-Foces, 1990
; Gavezzotti and Filippini, 1994
; Pirard et al., 1995
). The centroid positions and axes also agree well with previously reported values (Schneider et al., 1993
; Schneider and Berman, 1995
).
Ligand-binding potentials obtained by clustering
Contour surfaces of elastic potentials obtained by applying clustering techniques to the distributions of water around adenine and thymine are reported in Fig. 1 c. The images are qualitatively similar to those obtained with Fourier averaging. The distances between the centers of corresponding ellipsoids obtained by the two approaches are small (usually <0.7 Å). Many of the ellipsoids generated by the clustering of ligand coordinates, however, are thinner and more elongated than the more nearly spherical shapes obtained by the density calculations. The predicted binding of ligands to the DNA bases on the basis of the clustering of Cartesian coordinates is thus more directional than that expected from the ellipsoids derived from Fourier averaging.
Numerical analysis of the water and amino acid ellipsoids around the DNA bases (Table 2 and, for full description, Table S5 in Supplementary Material) reveals several factors responsible for the shapes of the clustering potentials. First of all, the sizes of the clustering ellipsoids are sensitive to the number and locations of ligand-binding sites in the datasets. The more widely scattered the atomic positions are in a cluster, the larger the ellipsoid is. The ellipsoids characterizing the weak hydrogen-bonding interactions of water with the carbon-base atoms show greater variability in size and center location than the ellipsoids associated with the base nitrogen and oxygen atoms. Second, a few extreme points can influence the shapes, i.e., principal axis lengths and directions, of ellipsoids derived from sparsely populated ligand clusters. Restriction of the current analysis to clusters with 5% or more of the total sites of ligand-base contact helps to minimize the variation in ellipsoidal shape. The ellipsoids obtained by Fourier averaging (Table S4 in Supplementary Material) are less sensitive to small changes in the compiled ensemble than are the ellipsoids derived by direct clustering techniques.
|
Some of the widely scattered waters near the N2 atom of guanine (not shown) form hydrogen bonds with both the N2 atom of G and the O2 atom of C. This feature is well represented by the ellipsoids computed with both Fourier averaging and clustering, even though the former calculations result in two distinct hydration sites and the latter yields a single elongated binding volume. To assess the influence of shared water molecules on ellipsoidal location, size, and direction, the water molecules, which are near guanine in B-DNA structures and also in contact with the complementary cytidine base, were removed from the set of observed binding sites. The regenerated N2 ellipsoid is only slightly smaller than the original ellipsoid and still overlaps the O2 ellipsoid of C.
Interestingly, the minor-groove N3 and O2 atoms are contacted preferentially via their lower faces in most B-DNA and protein-bound duplexes. Specifically, the centers of the N3 and O2 binding ellipsoids are displaced 0.5 to 1.0 Å below the planes of the heterocyclic rings, and the major (longest) axis of each ellipsoid lies roughly parallel to the base normal in most cases considered here; see direction cosines
3,i (i = 1,3) in Table 2, and Table S6 in Supplementary Material.
By contrast, the approach of ligands in the major groove depends on sequence. The long axes of the guanine O6 and N7 binding ellipsoids are consistently parallel to the normal of G, as opposed to the many examples where the approach to the corresponding N6 and N7 sites on adenine is more lateral. The directionality of interactions of the pyrimidines tends to be opposite to that of the complementary purines. That is, the cytidine N4 is contacted more laterally and the thymine O4 is approached from above or below, i.e., parallel to the base normal.
Comparison of density and clustering potentials
The elastic energy functions generated by clustering and Fourier averaging of B-DNA water sites are compared in Table 3 in terms of the relative "chemical" placement of the binding-site ellipsoids with respect to each of the contacted base atoms. The comparable lengths and angles of hydrogen bonds between the ellipsoidal centers and associated base atoms confirm the similar placement of water ellipsoids seen in Fig. 1. Corresponding hydrogen-bonding distances differ in most cases by 0.10 Å or less, and virtual valence and torsion angles generally agree within 5° and 10°, respectively. The differences are greatest for the least well-determined ellipsoidsN2 of G, C5 of C, C5M of T, C6 of pyrimidinesassociated with the base atoms that bind the fewest water molecules. The ligand-binding potentials computed at these sites are thus less accurate than the potentials at other binding locations. Despite these uncertainties, it is noteworthy that the hydrogen bonds involving the cytidine C5 atom are appreciably shorter than those of other CH
O interactions and, in fact, are shorter than most NH
O contacts. The C5 atom of cytidine bears a substantially larger negative charge than many nitrogen atoms in popular nucleic acid force fields (see below).
|
Occupancy values, reported in previous work (Schneider et al., 1993
, 1998
; Schneider and Berman, 1995
), are not considered for two reasons. First, there is no consistent way to define occupancy for the two ellipsoid-generating methods considered here. Second, the calculated occupancy values obtained by either method are highly volatile.
Dependence of derived hydration potentials on base charges
The strength of hydrogen bonding is often attributed to the magnitude of charges on associated proton donor and acceptor atoms (Jeffrey, 1997
). The ellipsoidal features of the present set of DNA-binding potentials may thus reflect the partial charges on the contacted atoms of the interacting bases and water molecules.
To test this hypothesis, we have compared the ellipsoidal parameters of the derived water-binding potentials with the partial atomic charges on the bases given in three popular nucleic acid force fieldsAMBER (Weiner et al., 1984
; Cornell et al., 1995
; Cieplak et al., 2001
), CHARMM (Brooks et al., 1983
), and the Poltev atomic potentials (Zhurkin et al., 1981
).
Three different forms of the partial charges were considered:
The hydration potentials were also expressed in terms of several different variables:
Pairwise linear regression analysis was then performed on all combinations of the partial atomic charges and ellipsoidal parameters of selected ligand-binding potentials, and the resulting correlation coefficients were determined. By restricting the analysis to the bound waters, there is no need to consider the effects of ligand partial charge on ellipsoidal properties. A total of 240 or 288 partial charge-parameter combinations (four force fields x three charge measures/force field x four forms of each charge measure x five or six variables per ellipsoid) were thus considered for each of the derived ellipsoidal potentials. The different numbers reflect the fact that one of the ellipsoidal variables, the number of atoms per hydration site, is not determined in Fourier averaging.
The AMBER2 charges, if expressed as the sum of the partial charges on the base atoms and one or all attached hydrogens, are found to be strongly correlated with the derived hydration potentials. For example, the mean hydrogen-bonding distances between the centers of the density-refined ellipsoids and the contacted base atoms are coupled, with a linear correlation coefficient of 0.73, to the net base charges (Fig. 4). The AMBER1 and Poltev charge sets show the same dependence on ellipsoidal geometry but the correlation coefficients are lower. The 1983 CHARMM charge set, however, is not correlated with the energy ellipsoids. The magnitudes of the partial atomic charges on the DNA bases are also strongly linked to the number of waters per binding site determined as part of the clustering treatment (correlation coefficient of 0.77). These findings support the simple notion that more highly polarized base atoms with partial charges of greater magnitude will attract their hydrogen-bonding water partners more strongly and thereby draw them closer and in greater number to the DNA. Conversely, the CH and NH proton donor groups on the bases with generally smaller partial charges (of magnitude 0.4 esu or less in the various force fields) are more distant from the surrounding water molecules than the acceptor oxygens or nitrogens of larger partial charges. The poor correlation (not shown) of the derived hydrogen-bonding distances with the surface electrostatic potentials of the basepairs (electrostatic potentials of specific basepair sites were obtained by averaging the potential determined by solution of the Poisson equation at accessible sites 1 Å beyond the van der Waals' surface (A. R. Srinivasan, R. R. Sauers, M. O. Fenley, A. H. Boschitsch, A. Matsumoto, A. V. Colasanti, and W. K. Olson, unpublished data)), confirms the apparent dominance of short-range effects on the water-base contacts. A compilation of charge sets is included in Table S7 in Supplementary Material.
|
Examination of Table 3, which includes a "chemical" comparison of the relative positions of global versus local B-DNA-binding ellipsoids, reveals the general similarity of the two sets of potentials. The centers of the global and local ellipsoids near nitrogen and oxygen are very close, with mean hydrogen-bonding distances within 0.1 Å and virtual valence and torsion angle differences of <10°. The ellipsoids associated with the weaker hydration sites near carbon, however, show much greater differences in their center coordinates, with some distances differing by >0.3 Å and certain virtual valence or torsion angles showing discrepancies of 20° or more. These differences illustrate the limitations of the density calculations in generating ellipsoidal energy functions associated with weak CH hydrogen bonds.
The global treatment faces the same problems as the local refinement of ligand pseudoelectron densities and the generation of the corresponding ellipsoids, namely determination of the number of peaks to be retained and treatment of negative eigenvalues of the anisotropic thermal factors. The correction of negative eigenvalues by isotropic refinement of the relevant hydration sites usually resolves the problem. The resulting spherical energy functions are expected to be of lesser accuracy only if many hydration sites must be refined isotropically.
Computational limitations
Although the minor-groove ellipsoids associated with purine N3 or pyrimidine O2 atoms are readily determined, human intervention is needed to generate the ellipsoid near the N2 atom of G, because of the small number of associated ligands. To facilitate automatic global refinement of the binding sites, ligand positions around N2 of G are treated separately from those near other atoms on the molecule. The N2 ellipsoid is constructed by global refinement of ligand-binding positions around an ideal G·C pair, and then superimposed on each G in the global frame of DNA.
The global methodology fails in the treatment of the major groove in that too many ellipsoids are generated in the refinement. A number of factors contribute to the problem. First, the hydrogen-bonding patterns are more complicated in the major groove than in the minor groove. Second, the strong hydration sites at adenine N6 and N7 partially overlap, and third, the hydration sites associated with carbon atoms on the basesC5 of C, C5M of T, R(C6), R(C8)contain few scattered waters. Because major-groove ellipsoids cannot usually be generated for these weak hydration sites without human intervention, global refinement can only be applied to the study of ligand-DNA interactions in the minor groove.
Minor-groove ellipsoids
Fig. 5 illustrates, in stereo, the minor-groove binding ellipsoids obtained by global refinement of the water binding sites from high resolution B-DNA structures on a 6-bp double-helical fragment of one of the lexitropic drug-DNA complexes, bdd002 (Kielkopf et al., 1998b
), with a polyamide ligand designed to recognize the basepair edges. For simplicity, neither the ligand framework nor the predicted binding sites of the two basepairs at either end of the structure are shown. As seen from the figure, the observed positions of the hydrogen-bond donor and acceptor atoms of the designed (Im-Py-Hp) ligand closely overlap the centers of the predicted binding sites. The positions of drug atoms are denoted by small magenta spheres and the predicted binding sites by colored ellipsoids (red near hydrogen donor atoms and blue near hydrogen acceptor atoms on the bases). Quantitative evaluation of the ligand-binding sites in this and other drug-bound DNA molecules is described below.
|
|
|
As shown in Table 5, the energies of most drug atoms in the five 2:1 polyamide-DNA complexes are low in value, i.e., average scores of 5 or less, corresponding to the principal-axis displacement of a drug atom by 3.2
or less from the center of an expected water binding site. The energies of drug atoms with other binding sites are typically much larger. The energies derived from the ellipsoids obtained by Fourier averaging of water contacts are generally lower in value and thus more consistent with ligand positioning in the five complexes.
|
The assessment of ligand-binding interactions on the basis of potentials obtained by the global clustering of hydration sites is unsatisfactory. The computed mean binding scores in the five polyamide structuresbdd002: 5.9; bdd003: 3.6; gdj057: 5.5; dd0020: 5.0; and dd0021: 6.1are significantly higher than the corresponding values for the other knowledge-based potentials (Table 4, and Table S8 in Supplementary Material). Because of the poor performance of the functions based on global clustering, we omit their further consideration.
The computed energies of ligand atoms in 1:1 drug-DNA structures, presented in Table 5, confirm the effectiveness of the elastic potentials in identifying critical atoms on minor-groove binding ligands. If we set an acceptance limit on the calculated score of a critical atom at 10 or less (4.5
), 130 of the 136 critical drug atoms (95.6%) in 1:1 and 2:2 drug-DNA complexes are identified with the potentials derived by clustering, 135 atoms (99.3%) with the energies based on local pseudoelectron density refinement, and 132 atoms (97.1%) with the functions obtained by global pseudoelectron analysis. If the acceptance criterion is lowered to 5 (3.2
), the respective potentials identify 111, 127, and 121 critical atoms (81.6%, 93.4%, and 89.0%) successfully.
All three methods of generating knowledge-based potentials thus yield energies sensitive enough to characterize the drug-DNA interactions in known structures as low in energy. There are relatively few negative predictions, i.e., critical atoms with high energy values. On the other hand, it is possible that a mathematical model that generates a low number of negatives predictions might be overfitted, and the number of false positives with noncritical atoms in low energy positions might be high. To test the model further and to determine whether the knowledge-based potentials can account for sequence-specific binding of drugs in the DNA minor groove, we next examine the binding energies of various ligands with modified DNA sequences.
Energies of minor-groove ligands with modified DNA sequences
Sequence changes in conformationally locked DNA duplexes
As a first approximation, we assume that the substitution of one of the four common basepairs by another has no effect on the overall structure of a DNA complexed with a minor-groove binding ligand. This simplification ignores the well-known, albeit small, sequence-specific differences in the intrinsic structure of DNA basepair steps (Gorin et al., 1995
; Olson et al., 1998
) and any spatial adjustments of the drug against the surface of DNA, but is consistent with the limited effects of small, groove-binding molecules on ordinary B-DNA structure (see Discussion). We further assume that the intermolecular interactions in the crystal structure of a drug-DNA complex are lower in energy than those of other small molecules or DNA sequences in the same configuration. A change of one or two basepairs in the known drug-DNA crystal complexes is thus expected either to increase the overall hydrogen-bonding score or to introduce steric hindrance into the system.
Table 6 summarizes the interactions of various DNA sequences in contact with one of the five lexitropic polyamides from known 2:1 drug-oligonucleotide complexes (the remaining four are described in Table S9 in Supplementary Material). The binding scores are calculated with the same sets of knowledge-based potentials shown above. The energies of the modified sequences, each with a substitution of one of the four key basepairs in the center of the DNA (12 possible basepair substitutions), are expressed relative to the total score ETot of the sequence found in the crystal structure. The key basepairs contain the critical atoms which are thought to determine the specificity of drug recognition (Fig. 6). Because of the local nature of the knowledge-based functions, the scores associated with multiple basepair changes can be estimated from the sum of energies determined for single basepair substitutions. (This is also true for the energies based on global pseudoelectron density refinement, since the minor-groove binding ligands do not significantly distort the complexed DNA from the B-form.) Unfavorable steric interactions brought about by the substitutions are also tallied.
|
ETot and a null entry in the third column of Table 6, is inconsistent with this assumption. The ligands of two drug-DNA complexes, bdd003 and dd0021, are thought to be incapable of recognizing the difference between A·T and T·A pairs in the middle of the DNA sequences to which they are respectively bound. It is therefore possible that the change of an A·T pair in the middle of these sequences to a T·A pair or vice versa may decrease the total energy. This is the reason why some of the calculations involving bdd003 and dd0021 are considered to be consistent with expectations, even though the basepair substitutions introduce a negative value of
ETot and the complexes are free of steric hindrance (examples highlighted by symbols (
) in Table S9 in Supplementary Material). With this proviso, we find that only one of the 60 base-substituted structures is inconsistent with expectations. Specifically, the substitution of G4 by cytosine in bdd002 introduces a small (0.3) decrease in the interaction score based on the local pseudoelectron density potential and is free of the steric effects with imidazole anticipated by the drug design (see Discussion).
Base substitution scores
The top half of Table 7 summarizes the base substitution scores of the five lexitropic drug-DNA complexes plus the corresponding values obtained for the 2:1 complexes of distamycin and diimidazole lexitropsin with DNA (gdj054 and gdh060, respectively). As noted above, the latter molecules have limited base-recognition capabilities. The binding scores associated with the interactions of theses two ligands are accordingly insensitive to A·T
T·A substitutions and vice versa. The complete exchange of purine and pyrimidine bases, i.e., A·T
G·C or T·A
C·G, however, introduces steric clashes at the ligand-DNA interface and contributes to the A·T binding preferences of these drugs. Most of the basepair changes in the distamycin-DNA and diimidazole lexitropsin-DNA complexes therefore increase the intermolecular energy, either through the drug-DNA interaction score or unpermitted steric interactions. The interchange of certain A·T and T·A pairs in these complexes fails to increase the ligand-binding energy scores or to introduce steric hindrance. Such results are limited to the sites on DNA which the drugs cannot distinguish.
|
T·A; and (2) complete base exchange that alters the AT content, e.g., A·T
G·C. Roughly 80% of the latter substitutions either increase the knowledge-based energies or introduce steric hindrance between drug and DNA, but only 60% of the changes in the former category have such effects. The lower number reflects the nonspecific interactions of the ligands with AT-rich DNA. The computed inability of the drugs to bind GC-rich DNA is consistent with the literature (Goodsell, 2001
Comparison with in vitro experiments
Experimental systems
As a further test of the ellipsoidal potentials, we compare in Table 8 the computed energies of various drug-DNA complexes with the experimentally measured binding affinities of two polyamide drugs with three different DNA 11-mers (Pilch et al., 1999
). The drugs, ImImPy-
-PyPyPy-ß-Dp and ImPyPy-
-PyPyPy-ß-Dp (designed respectively to bind GGT·ACC and GTT·AAC duplex sequences), are single molecules with six sequentially linked rings connected by amide links in either half and at their centers by a
-aminobutyric acid hairpin turn (denoted by
in the preceding formulae) rather than the pairs of associated ligands found in the oligonucleotide crystal complexes considered above (see Fig. 7). The targeted DNA-binding sites are located in the middle of two of the three duplexes, 5'-CATTGGTAGAC-3' and 5'-CATTGTTAGAC-3' (here denoted by the sequence strands). The third duplex, 5'-CATTATTAGAC-3', is not expected to bind either ligand tightly.
|
|