| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Center for Computational and Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332-0400
Correspondence: Address reprint requests to Rigoberto Hernandez, Fax: 404-894-0594; E-mail: hernandez{at}chemistry.gatech.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
In this study, an information-theory entropy is proposed based on the backbone dihedral angle distributions of the protein structure. It underlies an auxiliary robust checking function for evaluating the compatibility of a given protein structure with the experimentally derived structures in the PDB with respect to its dihedral angles. The 20 Ramachandran plotsi.e.,
i-
i distributionsfor each of the naturally occurring amino acids are reconstructed using all of the nonredundant experimental protein structures available in the October 2004 PDB using a 90% sequence identity cutoff. In addition, the 400
i-
i+1 distributions accounting for the statistics in the two dihedral angles between specified adjacent amino acids have also been constructed and are presented. The latter distributions have been seen to contain nontrivial structure and the present resultsover the existing larger databaseserve to validate prior conclusions (19
22
). The information-theory entropy, S, is defined in terms of the probabilities (or likelihood) of particular pairs of dihedral angles along the protein given its primary structure. A standard entropy is defined using an ideal (but likely unattainable) structure in which every angle pair,
i-
i and
i-
i+1, takes on the value with maximum probability, where the index i labels a residue along a chain. The entropy difference,
S, is defined relative to the standard entropy of this structure, and has been calculated for all nonredundant protein structures in the PDB. A histogram of these entropy differences leads to a nontrivial distribution. As a simple test of whether such a distribution is sensitive to differences between the theoretically and experimentally generated structures in the PDB, this distribution has been obtained for each cohort. The deviations in these distributions will be seen to emerge primarily from those theoretical structures that have been obtained using statistical information that ignores long-range correlation due to, for example, secondary structural elements.
Furthermore, the distribution in
S can be used to define auxiliary checking functions, herein called D1 and D2, which characterize the degree to which the dihedral angles of a given structure are compatible with the existing database (23
). The
S distribution is peaked at a nonzero value because a typical structure contains a certain degree of correlation between distant residues due to secondary structural interactions. The use of the statistical distributions in the calculation of
S implies that this information is included in an averaged, or mean-field-like, sense. Thus D2 can signal the existence of atypical structures whose unusual behavior is due to specific interactions between distant residues. Of course, deviations may also be due to incorrectly obtained structures, though such a determination is not available simply from the knowledge of D2. It therefore complements the scores available in PROCHECK (6
,7
) and WHAT_CHECK (3
) in that it includes the
i-
i+1 correlations, and it provides a simple check of the deviation from non-mean-field-like structure. Hence this measure can be used to guide modeling studies and to validate experimentally derived structures, while bolstering the tools that are available to guide the formation of de novo and engineered protein structures. In fact, D2 provides an information-rich tool to guide experiments involving the replacement or redesign of large sections of protein structure (e.g., loop modeling). These new measures also complement the work of Shortle and co-workers (24
27
), who focus on the propensities of a given residue's dihedral angles due to the nearby structure (through an energy-based scoring function) rather than on the mutual probability of given residue pairs. These subtle distinctions give rise to differences in the information that the respective checking functions or scores report. Thus the central result of this work is the construction of a new checking function D2 that complements the existing checking functions by reporting on the extent to which the propensity of the dihedral angle deviations differ in a given protein from those of the reference database.
| METHODS |
|---|
|
|
|---|
i-
i and
i-
i+1 distributions
i,
i) characterize the probability distribution for angles
i and
i for each R of the 20 natural amino acids, where the two dihedral angles are defined by the backbone atom sequences, C(i 1)-N(i)-CA(i)-C(i) and N(i)-CA(i)-C(i)-N(i + 1), respectively, as shown in Fig. 1. An extensive analysis of the Ramachandran plots using a fairly recent edition of the PDB has been reported by Hovmöller et al. (28
|
distributionsin which the angles are associated with the sequential residuesto complement the information in the Ramachandran plot (19
i-
i+1 plot accounts for the correlation between two adjacent residues, its use in structure assessment provides a nontrivial sequence-dependent measure of the likelihood that a given pair of residues will be connected by the specified dihedral angles. In principle, one could also account for the explicit correlations present between additional structural observables such as in the recent study by Esposito et al. (48
and the angle
describing the rotation of the peptide bond. However, only the correlation between
and
around a residue and between bonded residues will be addressed, because, as shown below, this suffices to provide a different first-order estimate of protein structure than other scores presently available.
Data-mining the
i-
i+1 distributions
To obtain the 400 possible
i-
i+1 distributions labeled by each of the pairs of naturally occurring amino acids, a statistically representative sample of all possible proteins needs to be available. In this work (as with other similar studies), the sublibrary of deposited structures in the PDB are assumed to be representative of the protein space once it has been systematically pruned: DNA, RNA and complexes of proteins with DNA or RNA are removed. Model structures are discarded because of the unknown possibility that such theoretically derived structures may be of a different level of accuracy or representation. Additionally, structures with missing residues or containing unified atoms have been removed. (Although more aggressive pruning could have been done by discarding structures according to a more rigorous standard for its resolution, this was not done in this investigation.) After pruning the PDB subject to these criteria, the resulting library (called "EXP" throughout this work) includes a total of 24,444 experimentally derived structures.
The NR50, NR70, and NR90 sublibraries result from the intersection of the EXP library of October 2004 PDB structures with the nonredundant sequence databases posted in the PDBas listed in the April 2005 updateat the 50%, 70%, and 90% sequence identity levels, respectively (49
). The NR100 sublibrary is a subset of the EXP library in which a single arbitrarily-chosen structure is retained for each redundant sequence at 100% sequence identity. Note that, by definition, no two structures in a given database share a sequence identity greater than or equal to that of the database's defining percentage level. Hence, for example, the NR100 sublibrary will be smaller than the EXP library as the former includes only one structure for a given sequence. The subset, NR100T, of theoretically derivedthat is, modelprotein structures in the PDB at 100% sequence identity will also be investigated for confirmation of the relative level of information contained therein. The number of structures in each library is shown in Table 1.
|
|
i-
i+1 and 20 Ramachandran plots have been generated for each of the five sublibraries, NR50, NR70, NR90, NR100, and EXP. Their construction is described explicitly in Supplement A in the Supplementary Material, and the results for the NR90 sublibrary are provided in Supplement B in the Supplementary Material. Typical one-dimensional distributions of the projections of the
i-
i Ramachandran plots and the
i-
i+1 plots are displayed in Fig. 2 (for the procedure, see Supplement A, Supplementary Material). These results demonstrate the sequence dependence of the
i-
i+1 distribution, in accordance with the previous reports (19
i on the second residue and
i+1 on the first residue obviously illustrates the impact of the distant residue identity on the absolute value of the maximum probability. The effects on glycine are particularly pronounced as the peak position of the distribution changes with the distant residue identity (Fig. 2 b). The torsion angles were extracted using a tool kit written in FORTRAN and verified within our group (S. Zhong and R. Hernandez, 2005. SiFiScore Toolkit, unpublished code). The 420 histogrammed distributions for NR90 have been saved into a single database which can, in turn, be used to calculate the dihedral-angle information entropy difference,
S, defined in Eq. 6 below.
|
, the dihedral angle pairs across its n residues consist of the (n 2)
i-
i pairs and associated probabilities
at each site i for i ranging across 2 and n 1. Similarly,
gives rise to the (n 1)
i-
i+1 pairs and associated probabilities
between successive residues at i and i + 1 for i ranging across 1 and n 1. For convenience, these two sets are interlaced into a single vector
whose 2n 3 entries are defined as
![]() | (1a) |
![]() | (1b) |
A Shannon entropy rooted in information theory (51
) can now be rewritten as
![]() | (2) |
specifies the angles according to the particular structure
, and the residues are paired according to
![]() | (3a) |
![]() | (3b) |
. A standard information entropy for a given structure can be defined in terms of the most probable dihedral angles for a given primary sequence,
![]() | (4) |
![]() | (5) |
only with respect to the specification of its primary sequence,
. The averaged entropy difference for a given structure relative to the standard can be written simply as
![]() | (6) |
Solis and Rackovsky (52
,53
) defined a similar information entropy to that of Eq. 2 for protein structure prediction. However, none of their measures emphasized the use of the
i-
i+1 distributions, and the possible correlation between neighboring amino acids that such distributions may display. Meanwhile, the GOR algorithm (54
,55
) uses the statistics of the multiple sequence alignment of segments of 17 or more residues in length to predict secondary structure assignments. The approach in this article is complementary to the GOR algorithm in that both recognize the need for studying multiple residue correlations: the latter emphasizes a larger segment while limiting the number of possibilities to the secondary structural motifs whereas the formerthat is, the present approachemphasizes segments limited to residue pairs while extending the accessible space to that of a discretization of the two-angle space with more than 5000 binsthat is, possible configurations.
Given the coordinates of a protein structure, the series of dihedral angles {
k} can readily be computed. The probabilities entering in the sum of the structural entropy each depend on the relative probability that the measured dihedral angles are compatible with the corresponding residue(s) they connect. That is, the probabilities entering in Eq. 2 are
, where
i
wk(i) and
i
vl(i), given that {wk} and {vl} are the partitions in the angle space used to construct the histogrammed distributions. This procedure, while direct, discretizes the possible results. Smoother estimates of the dihedral-angle information entropy could be obtained using standard interpolating techniques. But this is not done here because the simpler discrete approach provides estimates of the structural entropy with sufficient accuracy to test the proposed checking functions.
A checking function for secondary structure propensity
Given the normalized probability distribution, P(
S), and a putative structure with well-defined dihedral angles, {(
i,
i), (
i,
i+1)}, an integrated probability function for the entropy difference can be defined by merging the left and right cumulative distribution functions as
![]() | (7) |
is the median value of
S. The integral I will, by definition, take the value of
when evaluated at the median. The deviation relative to the median can thus be characterized by
![]() | (8) |
To make the D1 checking function even more intuitive, a new checking function D2 is defined to roughly describe the number of standard deviations away from the median structure through the expression
![]() | (9) |
As described in Supplement C in the Supplementary Material, the D2 checking function evaluated for a Gaussian distribution with zero mean and unit standard deviation is exactly equal to the number of standard deviations away from the median structure. Thus D2 may be interpreted as a measure of the relative likelihood for
S in terms of deviations from the mean. It effectively uniformizes the distribution in the sense that it maps the original distribution precisely to the normal curve. In particular, values of |D2| >3 suggest that the specified structure in a group of structures whose cumulative likelihood, while possible, is <0.13%. To check the effectiveness of these new scores, D1 and D2 are calculated separately for the EXP and the NR100T libraries below.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
i-
i and
i-
i+1 dihedral angle distributions for all five libraries outlined in the section on Data-Mining the
i-
i+1 Distributions are presented and described in Supplements A and B in the Supplementary Material. In addition to their role in this work, they may be of use in homology-based methods for constructing proteins. For example, Srinivasan and co-workers (19
On the choice of the sequence database
To implement the checks presented in the section on A Checking Function for Secondary Structure Propensity, an underlying database must be selected. The EXP library would be a poor choice because it necessarily includes multiple copies of the same structure. Theoretically derived structures should also be ignored because they may differ from the experimental database. To choose which of the experimental subsets of the nonredundant sublibrariesNR50, NR70, NR90, or NR100would be optimal, it is helpful to construct the corresponding dihedral-angle information entropy and their relative properties. In particular, the distributions of
S(X)based on the NRX sublibraryhave been evaluated across all the structures in each of the five sublibraries: NR50, NR70, NR90, NR100, and EXP. The statistical error in
S(X) decreases with increasing X because the size of the sublibrary increases with X. But at the same time, the bias due to redundancy is also increasing with X.
The distributions of
S(X) are shown in Fig. 3. The EXP library and NR100 sublibrary contain several sets of structures with considerable sequence identity resulting in skewed distributions regardless of the choice of the checking function. As expected, the relatively small size of the sublibraries underlying the
S(50
) and
S(70) measures leads to noisy distributions. Meanwhile, the distributions in
S(100) appear to be broadened by the underlying redundancy in the NR100 sublibrary. The differences between the five sublibraries appear to be revealedand perhaps convergedmost sharply by panel c, which displays the distributions for
S(90). One might be tempted to choose
S(70) instead of
S(90) because both scores reveal that the NR90 distribution is more like that of the redundant libraries. However, the better statistics of
S(90) in light of the relatively small redundancy error, and the similarity in the peak positions between NR100 and NR90 as listed in Table 1, suggests that NR90 is an optimal choice. In light of this heuristic argument, NR90 is used in the remainder of this article as the underlying distribution in calculating
S and the associated checking functions; the superscript in
S(90) is henceforth omitted.
The distributions of
S for experimental and theoretical structures in NR90 and NR100T, respectively, are shown in Fig. 4. The mean value and standard deviation
of
S of experimental structures are 4.38 x 103 and 5.74 x 104, respectively, indicating that roughly 71% of the total structures have a
S between 3.81 x 103 and 4.95 x 103, i.e., between 
S
and 
S
+
. The mean value and standard deviation for the theoretical structures are 4.35 x 103 and 6.82 x 104, respectively, and
64% of the theoretical models have a
S within one standard deviation of the mean of the experimental models. The two distributions are surprisingly similar, particularly since the difference seen between the NR90 and NR100 distributions does not appear to persist for NR100T. The origin of this likely lies in the fact that the NR100T sublibrary does not have NR100's degree of sampling bias, because the latter contains many similar single-point mutants. However, on average, fewer theoretically determined structures are within a
of the mean and this is a notable difference between the experimental and theoretical structures. This result is likely a consequence of the fact that many theoretical structures use rule sets for their construction which do not reflect the degree of correlation between distant residues present in nature. These observations indicate the insight that
S provides on the relative compatibility of a given structure with respect to the experimental NR90 sublibrary of the PDB.
|
D1 and D2 checks
The distributions of D1 calculated using Eq. 8 across the NR90 and NR100T sublibraries are shown in Fig. 5. The distributions are nearly Gaussian as suggested above. However, features seen above in Fig. 4 in assessing the relative compatibility between the NR90 and NR100T sublibraries are still visible in Fig. 5. The distributions in D2 displayed in Fig. 6 retain these features as well, but the uniformizing procedure outlined in Supplement C (see Supplementary Material) now leads to a normal Gaussian distribution for the NR90 structures. Interestingly, the lack of correlation in some of the NR100T structures is exhibited by a shoulder on the left side of the NR100T distribution.
|
|
S is not symmetric. If it were symmetric, then the simpler arguments at the end of the previous section using a single characteristic
would suffice. As remarked previously (and shown explicitly in Supplement C in the Supplementary Material), in the limit that the distribution in
S is Gaussian, the definition of D2 reduces precisely to the number of standard deviations that a given structure differs from the median. In summary, Eqs. 8 and 9 define equivalent new checks, D1 and D2, for the compatibility of the dihedral angles of a given structure with the existing PDB set of nonredundant experimental structures, although D2 is preferred because it takes on nontrivial values even for exponentially unlikely structures.
To illustrate the values of the D1 and D2 checks, it is helpful to examine a few representative structures arbitrarily chosen from the PDB. The HIV envelope glycoprotein (1g9nG) (56
), the p53 DNA binding domain (1tupA) (57
), and the G-protein
-1 chain (1gg2A) (58
) are fairly common proteins whose structures have been resolved and deposited in the PDB. The D1 values for these structures are 0.06, 0.25, and 0.23, respectively, which alone might not seem to provide a simple score of the structural quality. However, the D2 values are 0.08, 0.32, and 0.33. These values are easily interpreted as they indicate that all three structures are within one standard deviation of the PDB database. That is, their dihedral angles with respect to correlation around a residue and between residues are typical of the structures in the NR90 sublibrary. But recall that their information entropy is consequently greater than their corresponding standard entropies. Thus, they evidently exhibit propensities for secondary structural interactions that are typical of the structures in the NR90 sublibrary.
Alternatively, the D2 check can be used to identify protein structures whose angles are atypical with respect to the distribution of correlated angles in the PDB. Such atypical structures are not necessarily incorrect structures. Indeed, when D2 is large and negative, the structures could be correct, but for whatever reason contain dihedral angles in the most probable positions independent of the sequence beyond their nearest neighbors. Alternatively when D2 is large and positive, particularly strong correlations of distant residues may give rise to angles that adopt low probability configurations. Although correct structures exist that satisfy such limits, they are still atypical relative to the distribution because, as shown in Fig. 4, most of the experimental structures in the NR90 sublibrary have a structural entropy difference near the mean,
. This raises the intriguing possibility that D2 can be used to highlight atypical regions in proteins that are atypical due to some functional constraint. These regions could arise for reasons related to active site architectures or regions critical to forming protein-protein interactions. Hence the D2 measure may serve a role in highlighting regions of interest when structures of unknown function or physiological role are solved as part of ongoing high throughput structural proteomics efforts. Long-range interactions through a protein structure are of course important to understanding catalysis, concerted movements, and even when seeking to understand the evolutionary history of proteins within a conserved family of proteins. Thus D2 can highlight these potential regions within a structure too.
The role of D2 in checking theoretical structures
All structures in the NR90 and NR100T sublibraries with a value of |D2|
3 are listed in Table 2. The number of such structures is 17 (0.6%) and 11 (1.7%) for the experimentally and theoretically derived structures, respectively. The structures in the larger EXP and model protein libraries have also been assessed according to the D2 check. It was found that 264 (1.1%) and 66 (6.7%) structures are atypical out of the 24,444 experimental and 981 theoretical structures available, respectively. (All of the atypical structures and their D1 and D2 values are listed in Supplement C in the Supplementary Material.) The fact that in these sublibraries, the theoretical structures are much more likely to be atypical than the experimental structures, is a possible indicator that the former is somehow different from naturally occurring structures. More importantly, the primary difference manifests as a shoulder in the distributions in the negative D2 region. This is the region that signals structures that are near to the structures with standard entropy. Thus the dihedral angles deviate little from the most likely angles, indicating that they have not been altered by secondary interactions. It should come as no surprise that some fraction of the theoretically derived structures contain dihedral angles that lack such information. However, the important result here is that D2 is a reporter of such propensities.
|
|
i-
i combination. A low G-factor often indicates an unusual structure (6
i-
i angles in a protein structure are. Z-scores above 4.0 and below 4.0 are very uncommon (3
|
| CONCLUSION |
|---|
|
|
|---|
Generally speaking, the D1 and D2 checks signal the propensity for a protein to contain secondary structural interactions in comparison with the PDB. The overall structures found to be atypical by these checking functions may be classified as:
In particular, large negative values of D2 check indicate structures that are perhaps too likely, while large positive values indicate structures that are perhaps too unlikely in comparison with the typical structures of the PDB database. The use of D2 check at the residue level has been developed and will be discussed separately (S. Zhong, S. Quirk, and R. Hernandez, unpublished). D2 check is complementary to existing scoring functions used in assessing structure predictions but provides a different form of stereochemical information. For example, it can be used in concert with other functions to identify important or unusual parts of a structure.
One criticism that could be levied against this workand indeed against many bioinformatic tools based on a reference setcenters on the question of whether the chosen reference sublibrary of the PDB is representative of the protein universe. The recent work of Zhang et al. (60
) suggests that the diversity of single-domain structures available in the PDB database is indeed representative of the protein universe. But there may be a danger that the distribution of such structures is skewed in some way. To reduce the presence of such biasing, the reference sublibrary selected in this work excluded structures that had >90% sequence redundancy. Meanwhile, the statistical information available from the current size of the database was sufficient only for bins with 5° windows. While both the coverage of the protein space and the accuracy of the distributions appear to be sufficient in the treatment performed here, one would expect that both would improve in the future as the PDB grows.
One additional result of this work is the confirmation that the
i-
i+1 plots contain correlation between dihedral angles of a given residue and the identity of the neighboring residue. This result validates previous observations (19
21
,44
,46
,47
,50
). It is seemingly in contradiction of the Flory isolated-pair hypothesis (61
) in which it was assumed that the
i-
i distribution of each residue in a protein backbone is independent of the neighbors' identities. However, the differences found here are sufficiently small that violations of the isolated-pair hypothesis are subtle. For this same reason, it is not surprising that Brooks and co-workers (62
) found that the isolated-pair hypothesis holds very well upon averaging over the ensemble to obtain conformational entropies.
In summary, this work serves to increase the awareness of the effect of nearest-neighbor frequency on the pairwise dihedral distributions and introduces a useful series of checking functions that can be used to interpret both experimental and theoretical protein structures.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This work has been partially supported by National Science Foundation grants No. NSF 02-123320 and No. 04-43564. Additionally, R.H. is the Goizueta Foundation Junior Professor.
| FOOTNOTES |
|---|
Submitted on May 16, 2006; accepted for publication August 29, 2006.
| REFERENCES |
|---|
|
|
|---|
2. Branden, C. I., and T. A. Jones. 1990. Between objectivity and subjectivity. Nature. 343:687689.[CrossRef]
3. Hooft, R. W. W., G. Vriend, C. Sander, and E. E. Abola. 1996. Errors in protein structures. Nature. 381:272.[Medline]
4. Abola, E. E., A. Bairoch, W. C. Barker, S. Beck, D. A. Benson, H. Berman, G. Cameron, C. Cantor, S. Doubet, T. J. P. Hubbard, T. A. Jones, G. J. Kleywegt, A. S. Kolaskar, A. Van Kuik, A. M. Lesk, H. W. Mewes, D. Neuhaus, G. Pfeiffer, L. F. TenEyck, R. J. Simpson, G. Stoesser, J. L. Sussman, Y. Tateno, A. Tsugita, E. L. Ulrich, and J. F. G. Vliegenthart. 2000. Quality control in databanks for molecular biology. Bioessays. 22:10241034.[CrossRef][Medline]
5. Ramakrishnan, C., and G. N. Ramachandran. 1965. Stereochemical criteria for polypeptide and protein chain conformations. II. Allowed conformations for a pair of peptide units. Biophys. J. 5:909933.
6. Morris, A. L., M. W. MacArthur, E. G. Hutchinson, and J. M. Thornton. 1992. Stereochemical quality of protein structure coordinates. Proteins. 12:345364.[CrossRef][Medline]
7. Laskowski, R. A., M. W. MacArthur, D. S. Moss, and J. M. Thornton. 1993. PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26:283291.[CrossRef]
8. MacArthur, M. W., and J. M. Thornton. 1993. Conformation analysis of protein structures derived from NMR data. Proteins. 17:232251.[CrossRef][Medline]
9. MacArthur, M. W., R. A. Laskowski, and J. M. Thornton. 1994. Knowledge-based validation of protein structure coordinates derived by x-ray crystallography and NMR spectroscopy. Curr. Opin. Struct. Biol. 4:731737.[CrossRef]
10. Laskowski, R. A., M. W. MacArthur, and J. M. Thornton. 1998. Validation of protein models derived from experiment. Curr. Opin. Struct. Biol. 8:631639.[CrossRef][Medline]
11. Brünger, A. T. 1992. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature. 355:472475.[CrossRef][Medline]
12. Kleywegt, G. J., and T. A. Jones. 1996. Phi/psi-chology: Ramachandran revisited. Structure. 4:13951400.[Medline]
13. Kleywegt, G. J. 1997. Validation of protein models from C
coordinates alone. J. Mol. Biol. 273:371376.[CrossRef][Medline]
14. Kleywegt, G. J., and T. A. Jones. 1997. Model building and refinement practice. Methods Enzymol. 277:208230.[Medline]
15. Kleywegt, G. J. 2000. Validation of protein crystal structures. Acta Crystallogr. D56:249265.
16. Hooft, R. W. W., C. Sander, and G. Vriend. 1997. Objectively judging the quality of a protein structure from a Ramachandran plot. CABIOS. 13:425430.[Medline]
17. Lovell, S. C., I. W. Davis, W. B. Arendall III, P. I. W. de Bakker, J. M. Word, M. G. Prisant, J. S. Richardson, and D. C. Richardson. 2003. Structure validation by C
geometry:
,
, and Cß deviation. Proteins. 50:437450.[CrossRef][Medline]
18. Willard, L., A. Ranjan, H. Y. Zhang, H. Monzavi, R. F. Boyko, B. D. Sykes, and D. S. Wishart. 2003. VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res. 31:33163319.
19. Sudarsanam, S., R. F. DuBose, C. J. March, and S. Srinivasan. 1995. Modeling protein loops using a
i+1,
i dimer database. Protein Sci. 4:14121420.[Abstract]
20. Sudarsanam, S., and S. Srinivasan. 1995. Searching for protein loops in parallel. CABIOS. 11:591593.[Medline]
21. Sudarsanam, S., and S. Srinivasan. 1997. Sequence-dependent conformational sampling using a database of
i+1 and
i angles for predicting polypeptide backbone conformations. Protein Eng. 10:11551162.
22. Parker, J. M. R. 1999. The relationship between peptide plane rotation (PPR) and similar conformations. J. Comput. Chem. 20:947955.[CrossRef]
23. Ozer, G., J. Foley, S. Zhong, J. M. Moix, S. Quirk, and R. Hernandez. 2006. http://www.d2check.gatech.edu/.
24. Shortle, D. 2002. Composite of local structure propensities: evidence for local encoding of long-range structure. Protein Sci. 11:1826.
25. Shortle, D. 2003. Propensities, probabilities, and the Boltzmann hypothesis. Protein Sci. 12:12981302.
26. Fang, Q. J., and D. Shortle. 2005. A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins. 60:9096.[CrossRef][Medline]
27. Fang, Q. J., and D. Shortle. 2005. Enhanced sampling near the native conformation using statistical potentials for local side-chain and backbone interactions. Proteins. 60:97102.[CrossRef][Medline]
28. Hovmöller, S., T. Zhou, and T. Ohlson. 2002. Conformations of amino acids in proteins. Acta Crystallogr. D58:768776.[CrossRef]
29. Sheik, S. S., P. Ananthalakshmi, G. R. Bhargavi, and K. Sekar. 2003. CADB: conformation angles database of proteins. Nucleic Acids Res. 31:448451.
30. Priestle, J. P. 2003. Improved dihedral-angle restraints for protein structure refinement. J. Appl. Crystallogr. 36:3442.[CrossRef]
31. Dayalan, S., S. Bevinakoppa, and H. Schroder. 2004. A dihedral angle database of short sub-sequences for protein structure prediction. The Second Asia-Pacific Bioinformatics Conference, Australian Computer Society, Inc., Sydney, NSW, Australia.
32. Vriend, G. 1990. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8:5255.[CrossRef][Medline]
33. Zheng, Q., R. Rosenfeld, C. DeLisi, and D. J. Kyle. 1994. Multiple copy sampling in protein loop modeling: computational efficiency and sensitivity to dihedral angle perturbations. Protein Sci. 3:493506.[Abstract]
34. Mathiowetz, A. M., and W. M. Goddard III. 1995. Building proteins from C
coordinates using the dihedral probability grid Monte Carlo method. Protein Sci. 4:12171232.[Abstract]
35. Cheng, B., A. Nayeem, and H. A. Scheraga. 1996. From secondary structure to three-dimensional structure: improved dihedral angle probability distribution function for use with energy searches for native structures of polypeptides and proteins. J. Comput. Chem. 17:14531480.[CrossRef]
36. Fiser, A., R. K. Gian Do, and A.
ali. 2000. Modeling of loops in protein structures. Protein Sci. 9:17531773.[Abstract]
37. Marti-Renom, M. A., A. C. Stuart, A. Fiser, R. Sanchez, F. Melo, and A.
ali. 2000. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29:291325.[CrossRef][Medline]
38. Baker, D., and A.
ali. 2001. Protein structure prediction and structural genomics. Science. 294:9396.
39. Fiser, A., M. Feig, C. L. Brooks III, and A.
ali. 2002. Evolution and physics in comparative protein structure modeling. Acc. Chem. Res. 35:413421.[CrossRef][Medline]
40. Jacobson, M. P., D. L. Pincus, C. S. Rapp, T. J. F. Day, B. Honig, D. E. Shaw, and R. A. Friesner. 2004. A hierarchical approach to all-atom protein loop prediction. Proteins. 55:351367.[CrossRef][Medline]
41. Wu, T. T., and E. A. Kabat. 1971. An attempt to locate the non-helical and permissively helical sequences of proteins: application to the variable regions of immunoglobulin light and heavy chains. Proc. Natl. Acad. Sci. USA. 68:15011506.
42. Kabat, E. A., and T. T. Wu. 1972. Construction of a three-dimensional model of the polypeptide backbone of the variable region of
-immunoglobulin light chains. Proc. Natl. Acad. Sci. USA. 69:960964.
43. Wu, T. T., and E. A. Kabat. 1973. Attempt to evaluate influence of neighboring amino-acid (n1) and (i+1) on backbone conformation of amino acid (n) in proteins' use in predicting three-dimensional structure of polypeptide backbone of other proteins. J. Mol. Biol. 75:1331.[CrossRef][Medline]
44. Pappu, R. V., R. Srinivasan, and G. D. Rose. 2000. The Flory isolated-pair hypothesis is not valid for polypeptide chains: implications for protein folding. Proc. Natl. Acad. Sci. USA. 7:1256512570.
45. Chakrabarti, P., and D. Pal. 2001. The interrelationships of side-chain and main-chain conformations in proteins. Prog. Biophys. Mol. Biol. 76:1102.[CrossRef][Medline]
46. Zaman, M. H., M. Y. Shen, R. S. Berry, K. F. Freed, and T. R. Sosnick. 2003. Investigations into sequence and conformational dependence of backbone entropy, inter-basin dynamics and the Flory isolated-pair hypothesis for peptides. J. Mol. Biol. 331:693711.[CrossRef][Medline]
47. Betancourt, M. R., and J. Skolnick. 2004. Local propensities and statistical potentials of backbone dihedral angles in proteins. J. Mol. Biol. 342:635649.[CrossRef][Medline]
48. Esposito, L., A. De Simone, A. Zagari, and L. Vitagliano. 2005. Correlation between
and
dihedral angles in protein structures. J. Mol. Biol. 347:483487.[CrossRef][Medline]
49. RCSB Protein Data Bank. 2006. http://www.rcsb.org/pdb/clusterStatistics.do.
50. DeWitte, R. S., and E. I. Shakhnovich. 1994. Pseudodihedrals: simplified protein backbone representation with knowledge-based energy. Protein Sci. 3:15701581.[Abstract]
51. Shannon, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27:379423.
52. Solis, A. D., and S. Rackovsky. 2002. Optimally informative backbone structural propensities in proteins. Proteins. 48:463486.[CrossRef][Medline]
53. Solis, A. D., and S. Rackovsky. 2004. On the use of secondary structure in protein structure prediction: a bioinformatic analysis. Polym. 45:525546.[CrossRef]
54. Garnier, J., D. J. Osguthorpeb, and B. Robson. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97120.[CrossRef][Medline]
55. Kloczkowski, A., K. L. Ting, R. L. Jernigan, and J. Garnier. 2002. Combining the GOR algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins. 49:154166.[CrossRef][Medline]
56. Kwong, P. D., R. Wyatt, J. Robinson, R. W. Sweet, J. Sodroski, and W. A. Hendrickson. 1998. Structure of HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature. 393:648659.