| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |

* Polymer Research Center and Chemical Engineering Department, Bogazici University, Bebek, Istanbul, Turkey; and
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts USA
Correspondence: Address reprint requests to Celia A. Schiffer, Tel.: 508-856-8008; Fax: 508-856-6464; E-mail: Celia.Schiffer{at}umassmed.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The protease is highly specific in catalyzing the cleavage of 10 sites in the gag and pol polyproteins. These sites, however, share little sequence homology and lack an obvious consensus binding motif. It is known that the protease can bind to a large variety of peptides but the principles governing and the physical parameters determining substrate recognition and specificity remain poorly understood.
Crystal structures of HIV-1 protease in complex with a variety of inhibitors are deposited in the Protein Data Bank (PDB). However, there was, until recently, a lack of structures with natural substrates. The crystal structures of an inactive (D25N) protease with six decameric peptides corresponding to the natural cleavage sites within the gag and pol polyproteins were solved (Prabu-Jeyabalan et al., 2002
). The structural information obtained enables us to investigate how different sequences bind to the same molecule.
To understand the principles of substrate recognition, we applied an approach that has been used to address the inverse protein-folding problem. In this method, referred to as threading, the amino-acid sequence is threaded through known three-dimensional structures and the energy of the structure is evaluated based on pairwise contact potentials. The application of this approach to peptide complexes was originally proposed by Altuvia and co-workers and applied to the complexes of major histocompatibility complex (MHC) molecules (Altuvia et al., 1995
, 1997
; Schueler-Furman et al., 2000
). In the present work, we expanded upon this approach to look at the substrate specificity of HIV-1 protease.
The recently solved structures of HIV-1 protease substrate complexes provide ideal structural information to be used in threading analysis. The number of conformations the peptide can adopt in the binding groove is limited and defined by the protease structure that imposes physical constraints on the peptide. We applied several different threading procedures to differentiate between binding and nonbinding sequences and determine which factors are important in peptide recognition of HIV-1 protease. The first method was that of Altuvia et al. (1995)
, where a statistical potential matrix was used to evaluate the interaction of peptide with the protease residues it contacts (Miyazawa and Jernigan, 1996
). The residues were considered to be in contact or not according to three different distance criteria. This corresponds to approximating the interaction between residues by a square-well potential. In the second method, we employed distance-dependent statistical potentials (Bahar and Jernigan, 1997
). Then, we further developed the force field to include the effect of peptide conformation in the energy evaluation. With all three methods, we investigated whether using multiple template structures and taking the average improves the predictions or not. Finally, we used a dynamic Monte Carlo relaxation procedure after threading a peptide sequence onto the template structure. After these analyses, we found that using distance-dependent, long-range potentials and taking multiple peptide conformations into consideration improves the threading procedure, and that dynamic threading is a potentially useful method when there is only one complex structure available. Besides the long-range potentials accounting for the interactions between the peptide and the protease, the side-chain short-range potentials of the peptide were found to be important in discriminating between binding and nonbinding peptides. Although the active site can also adapt to some extent depending on the sequence bound, there is a constrained conformational space accessible to the bound peptide. Hence, the compatibility of the peptide sequence with the space in the binding groove has an important role in molecular recognition. This is also in accordance with the idea that a shape rather than specific amino acid residues is recognized by the protease (Prabu-Jeyabalan et al., 2002
), and implies that the peptide conformation should be taken into consideration to improve the predictions of threading methods.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Threading with a contact potential matrix
In this method, binding affinity of a peptide is predicted by the total energy of interaction with contact residues. The contacts of the peptide in the available template co-crystal structure are determined according to three different criteria: 1),
-carbon atoms are closer than 7.5 Å (Covell and Jernigan, 1990
); 2), ß-carbon atoms are closer than 7 Å (Altuvia et al., 1995
); and 3) any two atoms are closer than 4 Å (Madden et al., 1993
). Then, the amino-acid sequence of the query peptide is threaded onto the coordinates of the peptide in the template. The contacts are assumed to be conserved, and the total interaction energy is obtained by summing the interaction energy values of peptide residues using a contact potential matrix. The intraresidue energy for the host molecule (protease) amino acids is not included in the computation as it is considered to be constant for all the threaded peptides for a given template structure. The contacting residues are determined for the conformation in the known structure, and therefore are only approximate for different sequences threaded. Energy values for amino acid-to-amino acid interactions are taken from the table of statistical pairwise contact potentials derived by Miyazawa and Jernigan (1996)
.
Threading with distance-dependent potentials
The interaction energy of the peptide is calculated by employing distance-dependent interresidue potentials (Bahar and Jernigan, 1997
). These potentials were derived using 302 structures from the PDB (Bernstein et al., 1977
; Berman et al., 2000
). They are not fit to functions, and are discrete instead, at 0.4 Å resolution. Bahar and Jernigan used both solvent-exposed and residue-exposed reference states, which correspond to formation of a specific residue-to-residue contact at the expense of contacts with the solvent and with an average residue, respectively. An effective set of parameters to be used in protein simulations were derived from the potentials with these reference states that operate at different environments. Bahar and Jernigan also presented effective contact potentials obtained from the integration of radial distributions over different distance ranges. They could reproduce Miyazawa and Jernigan potentials as one case of these integrations. Miyazawa and Jernigan potentials were discussed to have quite weak specificity as they have a high radius of interaction (6.5 Å). The dominance of highly specific hydrophilic interactions at close separations was demonstrated by Bahar and Jernigan potentials. Hence, these potentials are expected to better account for specific side-chain contacts that may be of great importance in peptide-to-protease interactions.
In the previous method of threading with a contact potential matrix, the interaction energy between residues was approximated by a square-well type potential. For any two residues, the depth of the well was determined by the corresponding potential value in a statistical scoring matrix, and the interaction was considered to be in or out of the well according to a distance criterion. Hence, the selection of the distance criterion was a major concern in this all-or-none approach. In this next method, we eliminated the need of such a tentative criterion by using distance-dependent potentials. Two effective interaction sites per residue (its
-carbon atom for the backbone and a residue-specific side-chain site) were considered, and the energy of interaction between any two interaction sites were evaluated depending on the distance in between, and the type, of amino acid that the sites belong to. The total interaction energy of the peptide is found by summation over all n peptide and N protease residues as
![]() | (1) |
. The terms account for potentials between side-chain sites (SS), side-chain and backbone sites (SB), and two backbone sites (BB) of residues i and j, respectively.
Threading with conformational potentials
In this method, the conformation of the peptide was taken into consideration in calculating the total energy. To evaluate the conformational energy of the backbone, the statistical potentials, as based on the virtual bond model given by Bahar et al. (1997a)
for bond angle and bond torsions, are used as
![]() | (2) |
and
referring to the rotational angles of the virtual backbone bonds preceding and succeeding the ith
-carbon, respectively. The last term in this summation and the last summation account for the pairwise interdependence of the torsion and/or bond angle bending.
For the side chains, the probability distributions of Keskin and Bahar for packing of side chains in low-resolution models (Keskin and Bahar, 1998
) were converted into statistical potentials using the Boltzmann relationship. The energy associated with a side-chain bond angle at state
i for a residue type A is evaluated from
![]() | (3) |
) is the statistical probability of finding that bond at angle
and
is the background probability assuming uniform distribution probability. In the discrete state formalism adopted, the background probabilities are directly proportional to the mesh sizes. Analogous expressions were used for side-chain bond lengths and torsions. The side-chain conformational energy is summed up over all n side-chains in the peptide as
![]() | (4) |
,
, and
are the bond length, bond angle, and torsion angle of side chain i. The total energy of the peptide is found by the summation of its backbone and side-chain conformational energies, and the long-range interaction energy with the protease, which was evaluated using distance-dependent potentials as in the previous method.
Dynamic threading
The Monte Carlo (MC) minimization process used in dynamic threading is based on the reduced model and MC method previously used to simulate various protein structures (Bahar et al., 1997b
; Haliloglu and Bahar, 1998
; Kurt and Haliloglu, 1999
; Haliloglu, 1999
). The algorithm is as follows: both the protease and the threaded peptide are moved by a random combination of perturbations and the energy of the structure after each perturbation is checked. The protease and peptide are moved by randomly choosing a backbone or side-chain interaction site, and perturbing the Cartesian coordinates of the site by an amount
x = k (2r - 1), where r is a random number 0
r
1, and k is a proportionality factor controlling the strength of perturbation. Here, k was chosen to be 0.8 Å (consistent with the above-cited previous applications in protein simulations), which allows the protein to move only in the neighborhood of the original conformation.
The acceptance of each move is controlled on the basis of the Metropolis criterion (Metropolis et al., 1953
): conformations whose energy is lower than the previous one, or whose Boltzmann factor is greater than a random number between 0 and 1, are accepted. The total energy considered here is the combination of both short-range and long-range potentials summed over the entire structure,
![]() | (5) |
, and ELR are from Eqs. 2, 4, and 1, respectively. The term E(li) controls the stretching of the virtual backbone bonds by a stiff harmonic potential with a force constant of 10 RT/Å2, which allows only relatively small changes in the virtual bond lengths of the original structure. In accordance with conventions, one Monte Carlo step (MCS) comprises the N perturbations, where N is the total number of residues in the structure. The structure of ca-p2 complex with PDB code 1f7a is used as the starting conformation.
System and programs
All programs for threading analysis are written in FORTRAN programming language and run on a Silicon Graphics R5000 workstation. Prediction results from threading programs can be obtained in seconds, whereas a run of 1000 Monte Carlo step relaxations takes
23 h of computational time. The programs can be run on UNIX operating systems and are available upon request.
| RESULTS |
|---|
|
|
|---|
We also included the sequence of nc-p1 to the test set by shifting it one amino acid to the N-terminal side (called "nc-p1s"), as the sequence homology to the other substrates increases in this case (notice F and L residues in the P1 and P1' sites of p1-p6 and rh-in), but nevertheless the original sequence is recognized by the protease. The sequences of the substrates and peptides are given in Table 1.
|
Threading with a contact potential matrix
We applied the method of Altuvia et al. (1995)
to score and rank the binding affinities of peptides in Table 1 to HIV-1 protease. The threading methodology was described in detail in the original reference and summarized here in Materials and Methods. Table 2 gives the ranking of peptides according to the binding affinities predicted by this threading algorithm and using the ca-p2 complex structure as the template with three different distance criteria to define the contacting residues. Although it is reasonable to use the same distance criterion as in the parameterization of the statistical contact potentials, we applied all three criteria of Altuvia et al. (1995)
to enable a direct comparison of the results.
|
-carbon distances to determine the contacting residues gives a better prediction compared to others. Surprisingly, although it still ranks high, the template structure's own peptide (ca-p2) does not have the highest score, indicating that this force field may not have adequate precision. The shifted nc-p1s structure has a better score than the nc-p1 sequence, which is actually recognized by HIV-1 protease. Overall, there is a tendency that the nonbinding peptides are ranked lower than the binding ones, but it is not possible to differentiate the two using these rankings.
We performed the same analysis with another substrate (ma-ca) complex of HIV-1 protease (Prabu-Jeyabalan et al., 2002
). Table 3 gives the ranking results with this template structure, and Table 4 gives the average of results from the two template structures. With the ma-ca complex structure as the template, the nearest atom criterion seems to work better. However, the template structure's own peptide (ma-ca) has a very bad score, and is predicted to have a binding affinity even lower than nonbinding peptides. The results of threading are very much dependent on the template structure used, as a peptide ranks high if its binding scheme is similar to the template peptide. Hence, using multiple templates potentially should provide a better fit for the binding peptides. However, when the results from two template structures were averaged, no improvement in ranking was seen. Even when five and six template structures were used, the results did not change much. Especially within the coarse-grained scale of the
-carbon criterion, the residues considered to be in contact are almost the same for different template structures. Therefore, this crude force field is not accurate enough to distinguish the subtle differences between the various peptide sequences.
|
|
Table 5 gives the results of threading with distance-dependent interaction potentials using the two template structures and the average of results from the two. In the current energy evaluation scheme, there is no need for a criterion to decide on the contacting residues. Rather, a distance-dependent energy function is used with a less coarse-grained model, considering two sites per residue; one at its
-carbon atom and one at the side chain. This approach improves the accuracy of the threading. In this case, the template structure's own peptides have reasonable rankings; and, as expected, taking the average of two templates improves the ranking. This technique can even distinguish the subtly different nc-p1s sequence, which has a lower score than the real substrate. The nonbinding peptides rank worse, but the energy gap between the binding and nonbinding peptides is not yet significantly separated.
|
The threading results with conformational potentials are given in Table 6 for two different template structures. When the conformation of the peptide in the template is taken into account in evaluating the energy, the template structure's own peptide has the best score in both cases. This results from using a more detailed force field which defines the energy of the peptide more precisely.
|
|
10 RT, which would allow identifying the two groups efficiently without prior knowledge of their identities.
Dynamic threading
As a last method, we modified the threading methodology by introducing dynamics to allow the relaxation of the system to equilibrate and minimize its energy after threading the query amino-acid sequence onto the structure. This is potentially helpful when there are not multiple structures to be used as templates. We employed a Monte Carlo/Metropolis-type dynamic minimization process with a simplified coarse-grained model of the protein structure.
The total energy of the peptide, comprising long- and short-range potentials throughout a minimization of 2000 MC steps (MCS), is given for three of the natural substrates in Fig. 1. Two independent runs are made for each threaded sequence. The results from both are given in the graphs as separate curves in broken lines and they are quite similar. For the threaded substrates in Fig. 1, there is a rapid relaxation and decrease in energy to approach the energy of the template's own peptide. The results for two of the nonbinding peptides are given in Fig. 2 in the same format as Fig. 1. In this case, there is not a rapid relaxation of the energy and the energy does not converge to the reference energy during the simulation. The results are promising in differentiating between binders and nonbinders; therefore, we carried out the relaxation process for all the sequences in the test set.
|
|
|
|
23 h per sequence on an R5000 SGI workstation). | DISCUSSION AND CONCLUSIONS |
|---|
|
|
|---|
In the first method applied, the interactions between the peptide and protease residues in close proximity were approximated by square-well-type long-range potentials. The depth of the well was determined by the type of amino acids in contact, and taken from a statistical contact potential matrix. The important point in this approach is to determine the distance parameter of the square-well potential; that is, the maximum distance, between atoms of the residues, that is required to consider their interaction (a constant value) or not. We tried using three different criteria to answer this question as was done for the MHC system (Altuvia et al., 1995
). Nevertheless, we could not obtain results that could separate the binders from the nonbinders in the test set even when we used multiple templates.
Employing distance-dependent potentials as a second method of evaluating the long-range interaction of the peptide, we obtained improvement in the results. Instead of the square-well potential, here we used distance-dependent statistical potentials specific for the type of interacting amino acids. This approach eliminates the need of choosing a distance criterion to determine which residues are in contact and which are not. Instead, the potential energy function gives a certain value depending on the distance. The residues were represented by two effective interaction sites, one for the backbone and one for the side chain specific to the amino-acid type. Introducing a more detailed representation of the long-range interactions and using multiple templates enabled predictions to separate the binders and nonbinders in the test set.
When the structures of six substrate complexes of HIV-1 protease were solved, it was seen that superposition of the structures of any three substrates defines a consensus volume where the substrates fit (Prabu-Jeyabalan et al., 2002
). This leads to the idea that a shape, rather than certain amino acids, are recognized by the protease. Although the protease also adapts to bind different sequences, the binding groove restricts the conformations accessible to the bound peptide. The affinity of the peptide is thus affected by how well it can fit into the volume defined by the binding groove. To account for this restriction, we added conformational short-range potentials to the energy evaluation scheme in threading. In this approach, one has to consider different conformations accessible to the peptide, and thus it is very important to use multiple template structures. In accordance with these notions, we obtained a clear differentiation between binders and nonbinders in the test set with the employment of conformational potentials in addition to distance-dependent long-range potentials and multiple templates. This finding suggests that the "fitness" of a given peptide to the conformations accessible in the bound form is an important determinant of its binding affinity; hence short-range as well as long-range potentials should be considered in the evaluation of energy in threading methods. In the general field of protein structure prediction, there have been works to include extra terms to the score or force field accounting for local information, by secondary structure predictions (Russell et al., 1996
; Rost et al., 1997
) or experimental data such as nuclear magnetic resonance (Ayers et al., 1999
). Wolynes and co-workers demonstrated that including local environmental preferences and residue contacts refined their screening technique in correctly discriminating correct folds (Goldstein et al., 1992
). There have also been some approaches with emphasis on the local aspects of conformation and forces that operate on the short range of a polypeptide backbone (Jones, 1999
; Sippl, 1990
). Our results indicate that short-range potentials are important in protein-to-protein interactions, where the conformation of the side chains is expected to play an important role.
In another test to justify the improvement obtained in the threading methods, we evaluated their performances using the rank analysis: how are the binding potentials of the natural cleavage site sequences ranked among all the possible 8-mer sequences derived from the overlapping peptides in the gag-pol polyproteins? We would expect that the sequences that best fit to the binding site will be recognized and cleaved by HIV-1 protease, and therefore threaded all possible 8-mers in the polyproteins onto the known peptide complexes to see if the cleavage sites could be found. The structure of the polypeptides when they are cleaved by the protease is not known and this could also affect the recognition events. Nevertheless, consistent with the results for the test set, there was an improvement in the rankings of the cleavage sites as the force field was improved, and as multiple templates were used (Fig. 4). Applying the most accurate method, where both short- and long-range potentials are used, the template structure's own peptide always ranks the first among all possible 8-mers in the polyproteins. This indicates that the force field precisely defines the energy of the peptide when the exact conformation is available. Other sites within the polyprotein, which are not known to be cleavage sites, also score well; however, local secondary and tertiary structure may prevent them from being cleaved.
|
Threading should enable a computationally fast and less expensive screening of candidate sequences using a rough estimate of the binding affinity. Although the threading predictions improve upon employment of more detailed energy evaluations, all-atom representations and force fields such as in MD simulations and detailed structure predictions are not appropriate for threading. Hence, an optimum should be found by balancing the detail and speed of the method, taking into account the nature of the problem. Here we found that a threading method using conformational short-range and distance-dependent long-range potentials with two effective interaction sites per residue gives good enough predictions to differentiate between substrates and nonbinding sequences, when either multiple template structures are used or dynamic threading algorithm is applied with a single template. Both of these methods are computationally fast and effective. In this postgenomic era, they are potentially useful for screening a library of potential binding sequences to the newly discovered proteins.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Nese Kurt's present address is University of Wisconsin-Madison, Dept. of Chemistry, 1101 University Ave., Madison, WI 53706.
Submitted on November 27, 2002; accepted for publication May 6, 2003.
| REFERENCES |
|---|
|
|
|---|
Altuvia, Y., A. Sette, J. Sidney, S. Southwood, and H. Margalit. 1997. A structure-based algorithm to predict potential binding peptides to MHC molecules with hydrophobic binding pockets. Hum. Immunol. 58:111.[Medline]
Ayers, D. J., P. R. Gooley, A. W. Cooper, and A. E. Torda. 1999. Enhanced protein fold recognition using secondary structure information from NMR. Protein Sci. 8:11271133.[Abstract]
Bahar, I., B. Erman, T. Haliloglu, and R. L. Jernigan. 1997b. Efficient characterization of collective motions and interresidue correlations in proteins by low-resolution simulations. Biochemistry. 36:1351213532.[Medline]
Bahar, I., M. Kaplan, and R. L. Jernigan. 1997a. Short-range conformational energies, secondary structure propensities, and recognition of correct sequence-structure matches. Proteins. 29:292308.[Medline]
Bahar, I., and R. L. Jernigan. 1997. Inter-residue potentials in globular proteins and the dominance of highly specific hydrophilic interactions at close separations. J. Mol. Biol. 266:195214.[Medline]
Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235242.
Bernstein, F. C., T. F. Koetzle, G. J. Williams, E. F. Meyer, Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. 1977. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112:535542.[Medline]
Chou, K. C. 1996. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Anal. Biochem. 233:114.[Medline]
Covell, D. G., and R. L. Jernigan. 1990. Conformations of folded proteins in restricted spaces. Biochemistry. 29:32873294.[Medline]
Goldstein, R. A., Z. A. Luthey-Schulten, and P. G. Wolynes. 1992. Protein tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl. Acad. Sci. USA. 89:90299033.
Haliloglu, T., and I. Bahar. 1998. Coarse-grained simulations of conformational dynamics of proteins: application to apomyoglobin. Proteins. 31:271281.[Medline]
Haliloglu, T. 1999. Characterization of internal motions of Escherichia coli ribonuclease H by Monte Carlo simulation. Proteins. 34:533539.[Medline]
Jernigan, R., and I. Bahar. 1996. Structure-derived potentials and protein simulations. Curr. Opin. Struct. Biol. 6:195209.[Medline]
Jones, D. T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195202.[Medline]
Keskin, O., and I. Bahar. 1998. Packing of sidechains in low-resolution models for proteins. Fold. Des. 3:469479.[Medline]
Kurt, N., and T. Haliloglu. 1999. Conformational dynamics of chymotrypsin inhibitor 2 by coarse-grained simulations. Proteins. 37:454464.[Medline]
Madden, D. R., D. N. Garboczi, and D. C. Wiley. 1993. The antigenic identity of peptide/MHC complexes, a comparison of the conformations of five viral peptides presented by HLA-A2. Cell. 75:693708.[Medline]
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. J. Teller. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:10871092.
Miyazawa, S., and R. L. Jernigan. 1996. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 256:623644.[Medline]
Prabu-Jeyabalan, M., E. Nalivaika, and C. A. Schiffer. 2002. Substrate shape determines specificity of recognition for HIV-1 protease: analysis of crystal structures of six substrate complexes. Structure. 10:369381.[Medline]
Prabu-Jeyabalan, M., E. Nalivaika, and C. A. Schiffer. 2000. How does a symmetric dimer recognize an asymmetric substrate? A substrate complex of HIV-1 protease. J. Mol. Biol. 301:12071220.[Medline]
Rost, B., R. Schneider, and C. Sander. 1997. Protein fold recognition by prediction-based threading. J. Mol. Biol. 270:471480.[Medline]
Russell, R. B., R. R. Copley, and G. J. Barton. 1996. Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259:349365.[Medline]
Schueler-Furman, O., Y. Altuvia, S. Alessandro, and H. Margalit. 2000. Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci. 9:18381846.[Abstract]
Sippl, M. J. 1990. Calculation of conformational ensembles from potentials of mean force: an approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213:859883.[Medline]
This article has been cited by other articles:
![]() |
L. You, D. Garwicz, and T. Rognvaldsson Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease J. Virol., October 1, 2005; 79(19): 12477 - 12486. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |