| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Materials and Process Simulation Center, California Institute of Technology, Pasadena, California
Correspondence: Address reprint requests to William A. Goddard 3rd, E-mail: wag{at}wag.caltech.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
34% (Schöneberg et al., 2002
Within a class of GPCRs (for example, adrenergic receptors) there are often several subtypes (for example, nine for adrenergic receptors) all responding to the same endogenous ligand (epinephrine and norepinephrine for adrenergic receptors), but having very different functions in various cells. In addition, many different types of GPCRs are similar enough that they are affected by the antagonists or agonists for other types (e.g., among adrenergic, dopamine, serotonin, and histamine receptors), leading often to undesirable side effects. This makes it difficult to develop drugs to a particular subtype without side effects resulting from cross-reactivity to other subtypes. To design such subtype-specific drugs it is essential to use structure-based methods, but this has not been possible because there is no atomic-level structure available for any human GPCR. Consequently design of subtype-specific drugs for GPCR targets is a very tedious empirical process, often leading to drugs with undesirable side effects. The difficulty in obtaining three-dimensional structures for GPCRs is obtaining high-quality crystals of these membrane-bound proteins sufficient to obtain high-resolution x-ray diffraction data, and the difficulty of using NMR to determine structure on such membrane-bound systems. Hence we conclude that to aid the structure-based drug design for GPCR targets, it is essential to develop theoretical methods adequate to predict the three-dimensional structures of GPCRs from first principles. For globular proteins there have been significant advances in predicting the three-dimensional structures by using sequence homologies to families of known structures (Marti-Renom et al., 2000
); however, this is not practical for GPCRs, inasmuch as a high-resolution crystal structure is available for only one GPCR, bovine rhodopsinwhich has low homology (<35%) to most GPCRs of pharmacological interest.
Consequently we have been developing the MembStruk method for ab initio or first principles prediction of three-dimensional structure for GPCRs from primary sequence without using homology. MembStruk is based on the organizing principle provided by knowing that a GPCR has a single chain with seven helical TM domains threading through the membranewhich we find provides sufficient structural information (when combined with atomistic simulations such as molecular dynamics and Monte Carlo) for us to deduce three-dimensional structures for GPCRs that are adequate for prediction of the binding site and relative binding energy of agonists and antagonists. We have been applying MembStruk to several GPCRs, where the validation has been based on the comparison of the predicted binding site to experimental binding and mutation data. In this article we describe the details of the MembStruk method and validate the accuracy of the predictions by comparing with the only high-resolution crystal structure available for a GPCR, bovine rhodopsin.
Because the function of a GPCR is to signal to the interior of the cell in the presence of a particular ligand bound to the extracellular surface, it is most relevant to determine the three-dimensional structure for the conformation of the protein involved in activating G-protein. It is widely thought that there are two distinct conformations of GPCRs: one active and one inactive, in equilibrium, even in the absence of ligands (Melia et al., 1997
; Strange 1998
; Schöneberg et al., 2002
). This equilibrium is shifted when a ligand binds to the GPCR. Thus it would be valuable to know four structures of the proteinthe apo-protein in both the active and inactive forms and the ligand-bound form in both the active and inactive formsso that one could study the process of GPCR activation. Even for bovine rhodopsin, there is crystal structure data for only one of these four (the ligand-bound inactive form). We postulate in this article a model of activation involving the second extracellular (EC-II) loop and TM3 in which the structure is assumed 1), to be in the active form when the EC-II loop is open and 2), to be in the inactive form when the EC-II loop is closed.
It is the closed conformation that is observed in the rhodopsin crystal structure (Palczewski et al., 2000
; Okada, et al., 2001
). In this article we report the MembStruk-predicted structures for all four structures, although comparison can be made directly to experiment only for the closed-loop-with-ligand case.
Except for bovine rhodopsin the only experimental validation for the accuracy of predicted GPCR structures must rest on predicting the binding sites and energies for various ligands and how they are modified by various mutations. To make such predictions from first principles, we developed the HierDock method, which we validate here by predicting the binding site of retinal in bovine rhodopsin both for the experimental three-dimensional structure and for the predicted structures (open and closed loop).
The first report on MembStruk and HierDock (Floriano et al., 2000
; Vaidehi et al., 2002
) focused on olfactory receptors, where ligand-binding data was available for 24 simple organic molecules to 14 different olfactory receptors (Malnic et al., 1999
). More recently these methods have been applied to predict the structures and functions for GPCRs of such diverse subfamilies as ß1- and ß2-adrenergic receptor, dopamine D2 receptor, endothelial differentiation gene 6, and sweet gustatory and olfactory receptors (Vaidehi et al., 2002
; Freddolino et al., 2004
; Kalani et al., 2004
; Floriano et al., 2004a
). The HierDock technique has also been validated for globular proteins where the crystal structures are available (Wang et al., 2002
; Datta et al., 2002
, 2003
; Kekenes-Huskey et al., 2003
; Floriano et al., 2004b
). We find that the predicted structures of the adrenergic and dopamine receptors lead to binding sites for the endogenous ligands in excellent agreement with the plentiful mutation and binding experiments. Similarly, the predicted binding sites and affinities for endothelial differentiation gene 6, the mouse I7 and rat I7 olfactory receptors, and the human sweet receptor are consistent with the available experimental binding data.
However, a quantitative assessment of the accuracy of these structure and function prediction methods can be made only for bovine rhodopsin, for which there is a high-resolution experimental crystal structure available with ligand attached to the protein. Thus this article provides a detailed study of rhodopsin to validate the various steps involved in our procedures for prediction of the three-dimensional structures of GPCRs (MembStruk) and for the prediction of the binding site and the binding energy of the retinal ligand to bovine rhodopsin (HierDock).
Computational Methods gives the details of the MembStruk and HierDock protocols, followed by Results and Discussion, which describes the results of structure and function prediction for bovine rhodopsin. These results are also discussed in the Summary and in the Conclusions section.
| COMPUTATIONAL METHODS |
|---|
|
|
|---|
The ligands were described with the DREIDING FF (Mayo et al., 1990
) using charges from quantum mechanics calculations on the isolated ligand; electrostatic potential charges calculated using Jaguar, Ver. 4.0 (Schrodinger, Portland, Oregon). For the lipids we used the DREIDING FF with QEq charges (Rappé and Goddard, 1991
). Some calculations were done in the vacuum (e.g., final optimization of receptor structure to approximate the low dielectric membrane environment). For structural optimization in the solvent (water) we used the analytical volume Generalized-Born (Zamanakos, 2002
) approximation to Poisson-Boltzmann continuum solvation.
We use the DREIDING FF due to its generic applicability to all molecules constructed from main group elements (particularly all organics), inasmuch as we will use our methods to predict the binding site and energy for a diverse set of ligands of interest to pharmacology. Indeed, we find below that the minimized structure for bovine rhodopsin deviates from the crystal structure by only 0.29 Å coordinate root mean-square error. The DREIDING FF with CHARMM22 charges has been validated for molecular dynamics simulations and binding energy calculations for many proteins (Brameld and Goddard, 1999
; Datta et al., 2003
, 2002
; Wang et al., 2002
; Kekenes-Huskey et al., 2003
; Floriano et al., 2004b
) with similar accuracy.
Validation of the force fields
The crystal structure of bovine rhodopsin (resolution, 2.80 Å) was downloaded from the protein structure database (PDB entry 1F88). The Hg ions, sugars, and waters were deleted from this structure. This crystal structure is missing 10 complete residues in loop regions and the side-chain atoms for 15 additional residues. We added the missing residues and side chains using WHATIF (Vriend, 1990
). Then we added hydrogens to all the residues using the PolyGraf software. We then fixed the TM helices and minimized (using conjugate gradients) the structure of the loop region to a root mean-square force of 0.1 kcal/mol per Å. The potential energy of the entire structure of rhodopsin was then minimized (using conjugate gradients) to a root mean-square force of 0.1 kcal/mol per Å . This minimized structure deviates from the x-ray crystal structure by 0.29 Å coordinate root mean-square (CRMS) error over all atoms in the crystal structure. This is within the resolution of the crystal structure, validating the accuracy of the FF and the charges. This FF-minimized crystal structure is denoted as Ret(x)/closed(xray).
The MembStruk protocol for predicting structure of GPCRs
MembStruk uses the hydrophobic profile of multisequence alignment of GPCRs to assign the helical TM regions. This is combined with a series of steps of a Monte Carlo-like systematic search algorithm to optimize the rotation and translational orientation of the TM helices. This search algorithm allows the structure to get over barriers and make the conformational search more comprehensive. This is followed by molecular dynamics (MD) calculations at a variety of coarse-grain to fine-grain levels in explicit lipid bilayer.
MembStruk was first described in Floriano et al. (2000)
. This method (now labeled as MembStruk1.0), was improved to include energy optimization to determine the rotation of helices in the seven-helical TM bundle in Vaidehi et al. (2002)
(now referred to as MembStruk2.0). In this article we have modified MembStruk (now denoted as MembStruk3.5) to also include optimization of the helix translations along their axes and rotational optimization using hydrophobic moment of the helices. The MembStruk3.5 procedure for predicting structures of GPCRs consists of the following steps:
Henceforth in this article any reference to MembStruk always indicates MembStruk3.5 unless specifically referenced otherwise. We will next discuss some of the details of these steps in MembStruk. We should emphasize here that these steps are all automated into a single MembStruk procedure. Thus the sequence is fed to MembStruk and the result at the end is a final three-dimensional structure for the protein in the lipid bilayer. Of course we also examine the various intermediate results generated in this procedure to allow us to detect problems, to gain insight into the validity of the various criteria, and to provide hints on improvements to make in the methods.
Step 1: Prediction of TM regions (TM2ndS)
Prediction of the TM helical regions for GPCRs from the sequence rests on the assumption that the outer regions of the TM helices (in contact with the hydrophobic tails of the lipids) should be hydrophobic, and that this character should be largest near the center of the membrane (Donnelly, 1993
; Eisenberg et al., 1984
). The TM2ndS method uses this concept to generate a hydrophobic profile.
Step 1a: Sequence alignment
The first part of Step 1 for TM2ndS uses the SeqHyd hydrophobic profile algorithm, which is based on peak signal analysis of the hydrophobic profile. We first tested the use of the Prift hydrophobicity scale (Cornette et al., 1987
), but we found that the hydrophobicity index value for Arg was opposite that expected for a charged residue, leading to obviously incorrect assignments. We then switched to the use of the Eisenberg hydrophobicity scale (Eisenberg et al., 1982
), which is based on sound thermodynamic arguments. This scale has a range from -1.76 to 0.73 and works well for Arg and other residues to give consistent TM predictions for the many systems we have investigated. The Eisenberg scale has been used in all published MembStruk results (1.0 onward). SeqHyd requires a multiple sequence alignment using sequences related to bovine rhodopsin. This is constructed by using an NCBI Blast search (Altschul et al., 1990
, 1997
) on bovine rhodopsin (primary accession number P02699) to obtain protein sequences with bit scores >200 but not identical (to avoid numerical bias in later calculations) to bovine rhodopsin (E-value < e-100). We prefer an ensemble of sequences providing a uniform distribution of sequence identities from 35 to <100%. For bovine rhodopsin, this leads to the 43 sequences in Table S1 of the Supplemental Material. These 43 sequences plus bovine rhodopsin were used in ClustalW (Thompson et al., 1994
) to generate a pairwise multiple sequence alignment. This sequence alignment included sequences with identities to the bovine rhodopsin sequence as low as 40%. In general we might include sequences with higher nonzero E-values, but including too low a homology might lead to additional alignment problems.
Step 1b: Average consensus hydrophobicity and initial TM assignment
The second part of Step 1 of TM2ndS is to calculate the consensus hydrophobicity for every residue position in the alignment. This consensus hydrophobicity is the average hydrophobicity (using the Eisenberg hydrophobicity scale) of all the amino acids in that position over all the sequences in the multiple sequence alignment. Then, we calculate the average hydrophobicity over a window size (WS) of residues around every residue position, using WS ranging from 12 to 20. This average value of hydrophobicity at each sequence position is plotted to yield the hydrophobic profile, as shown in Fig. 1 for WS = 14. The baseline for this profile serves as the threshold value for determining the TM regions and is calculated as follows.
|
G = 0 value above which residues are thermodynamically stable in the transmembrane and below which they are not. This baseline is unique to the particular protein to which it is being applied, with its individual environmental factors (water clusters, ions, hydrophobic or hydrophilic ligand or interhelical interactions, membrane composition) that may change the relative stability of any particular residue. Below WS = 12 the fluctuations in hydrophobicity (noise) are too large to be useful. The lowest WS that yields seven peaks (with peak length >10 and <50, and peak area >0.8) is denoted as WSmin. The peaks ranges for WSmin are used as input for the helix-capping module discussed in the next section.
Fig. 1 shows that assigning the TM region to helix 7 is a problem because it has a shorter length and a lower intensity peak hydrophobicity compared with all the other helices. This has been observed for other GPCRs (Vaidehi et al., 2002
). The low intensity of helix 7 arises because it has fewer highly hydrophobic residues (Ile, Phe, Val, and Leu) and because it has a consecutive stretch of hydrophilic residues (e.g., KTSAVYN). These short stretches of hydrophilic residues (including Lys-296) are involved in the recognition of the aldehyde group of 11-cis-retinal in rhodopsin. For such cases, we use the local average of the hydrophobicity (from minimum to minimum around this peak) as the baseline for assigning the TM predictions. TM2ndS automatically applies this additional criteria when the peak length is <23, the peak area is <0.8, and the local average >0.5 less than the base_mod. For bovine rhodopsin only TM7 satisfies this criterion and the local average (0.011) is shown by the red line in Fig. 1. Thus, this local average is automatically applied for proteins where the residues are relatively hydrophilic but in which the helix might still be stable because of local environmental factors (mentioned above) that stabilize these residues.
Step 1c: Helix capping in TM2ndS
It is possible that the actual length of the helix would extend past the membrane surface. Thus, we carry out a step aimed at capping each helix at the top and bottom of the TM domain. This capping step is based on properties of known helix breaker residues, but we restrict the procedure so as not to extend the predicted TM helical region more than six residues. We consider the potential helix breakers (Donnelly et al., 1994
) as P and G; positively charged residues as R, H, and K; and negatively charged residues as E and D.
TM2ndS first searches up to four residues from the edge going inwards from the initial TM prediction obtained from the previous section for a helix breaker. If it finds one, then the TM helix edges are kept at the initial values. However, if no helix breaker is found, then the TM helical region is extended until a breaker is found, but with the restriction that the helix not be extended more than six residues on either side. The shortest helical assignment allowed is 21, corresponding to the shortest known helical TM region. This lower size limit prevents incorporation of narrow noise peaks into TM helical predictions.
We have used this TM2ndS algorithm for predicting the structure for
10 very different GPCR classes (Vaidehi et al., 2002
). In each case the predicted binding site and binding energy agrees well with available experimental data, providing some validation of the TM helical region prediction. However, only for bovine rhodopsin can we make precise comparisons to an experimental structure. Fig. 2 compares the predictions of TM helical regions for bovine rhodopsin to the TM helical regions as assigned in the crystal structure (Palczewski et al., 2000
). To determine which residues have an
-helical conformation, we analyzed the
-
angles using PROCHECK (Laskowski et al., 1993
) and considered the experimental structure to be in an
-helix if -37 <
< -77 and -27 <
< -67. This led to slightly shorter helices than quoted in the crystal structure article. Thus the lowercase letters in Fig. 2 indicate residues which are outside the above range but quoted as helices in the experimental article. The results are as follows.
|
,
) for this P is (not-applicable [N-terminus], -43.6) and for this H is (-54.3, -32.4), whereas the values obtained in the crystal structure are (-44.3, -24.9) and (-72.5, 69.5), respectively. Since P and possibly H might be expected to break the helix, we are considering modifying our procedure to exclude such terminal P or H in the helix.
For TM2 our prediction adds HG at the end. In our final structure the (
,
) for this H and G are (-73.6, -80.9) and (-55.0, 148.8), whereas the values obtained in the crystal structure are (-74.2, 0.5) and (66.1, 9.0), respectively. The crystal structure article considered the H as part of the helix. Since HG could be expected to break the helix, we are considering modifying our procedure to exclude the terminal HG in the helix. In fact, the HG angles in our final structure fall outside our criteria for
-helicity as a result of the MembStruk optimization of the structure.
For TM3 our predictions miss the RYVVV assigned in the crystal structure to the helix. Since the first and second V do not have (
,
) in the usual range for
-helices, we consider that the VVV should be excluded. However, the polar character of RY leads TM2ndS to miss assigning them as part of the helix. The crystallographic (
,
) values for R and Y residues are (-55.5, -63.8) and (-44.6, -56.3), whereas the values obtained in our final structure are (76.7, -51.4) and (-62.9, 119.2). It should be pointed out that the B-factors on the cytoplasmic end of the rhodopsin crystal structure are high in this region of the helix (PDB entry 1F88). This indicates that the helix is probably fluxional even when the receptor is not activated. Consequently caution should be used when comparing our predictions with the crystal structure at this end. Also, because the helices are translated to align hydrophobic centers in a later step of the procedure, this uncertainty in TM helical prediction may only lead to local errors in atomic structure.
For TM4 our prediction adds G at the end and misses N at the start. The crystallographic (
,
) for these N and G residues are (-43.5, -59.6) and (169.8, 5.4), whereas the values obtained in our final structure are (-93.9, 119.6) and (112.5, -118.4). Thus the predictions are fine even though the G and N were misassigned. We are considering modifying our procedure to exclude a terminal G.
Compared to the crystal structure assignment, our prediction for TM5 adds LVF at the end and misses N at the start. In addition the GQ at the end terminus in the crystal structure assignment have (
,
) outside the range for
-helices. Thus we consider that the terminal GQLVF in the TM2ndS predictions are in error, the largest error of any of the predictions. The crystallographic (
,
) for these N and LVF residues are (-69.3, -51.1), (-48.2, -36.7), (-39.6, -27.1), and (-58.0, -26.5), whereas the values obtained in our final structure are (-109.9, -162.4), (-55.1, -47.8), (-63.4, -59.0), and (-81.5, 59.3). The rhodopsin crystal structure has high B-factors for the intracellular end of TM5 (just as for helix 3), suggesting caution in making comparisons.
For TM6 our prediction adds H at the end and misses EVT at the start. The crystallographic (
,
) values for these EVT and H residues are (-57.6, -53.0), (-54.1, -55.7), (-56.3, -52.3), and (-81.3, 48.8), whereas the values obtained in our final structure are (-74.4, 72.3), (-73.1, 130.8), (-16.9, -53.0), and (7.1, 87.7). Thus the predictions are fine despite the misassignments. We are considering modifying our procedure to exclude a terminal H. In fact, the H angles in our final structure fall outside our criteria for
-helicity as a result of the MembStruk optimization of the structure.
For TM7 our prediction adds P at the start and misses Y at the end. The crystallographic (
,
) for the P and Y residues are (-30.2, -48.1) and (-46.0, -55.0), whereas the value for P obtained in our final structure is (-43.6, -23.2). Since the current MembStruk protocol does not model the structures of the C- and N-termini, we did include the Y in our structure. Thus the predictions are fine despite the misassignments. We are considering modifying our procedure to exclude a terminal P, but it is not obvious that a modified method would automatically include the Y. In fact, the P angles in our final structure fall outside our criteria for
-helicity as a result of the MembStruk optimization of the structure.
Overall, we consider that the predictions agree sufficiently well with the crystal structure to be useful in building them into the assembly. In addition, we can see several improvements in the capping procedure of TM2ndS that could have decreased the errors in predicting which residues near the ends are considered to be helix breakers for capping the TM helices. However, this article is meant to validate the procedure we have been applying to many systems and we did not want to change the procedure on the basis of our only independent validation.
Step 2: Assembly and optimization of the seven-helical TM bundle
Having predicted the seven TM helix domains using TM2ndS, we next build them into the seven-helical TM bundle. This involves two steps: 1), assembly and optimization of the relative translation and 2), rotation of the helices.
Step 2a: Assembly of the seven TM helices into a bundle
Canonical right-handed
-helices are built for each helix using extended side-chain conformations. Then the helical axes are oriented in space according to the 7.5 Å electron density map of frog rhodopsin (Schertler, 1998
). This 7.5 Å electron density map gives only the rough relative orientations of the helical axes, with no data on atomic positions. This serves as the starting point for optimization of the helices in the helical bundle. It should be emphasized here that no information as to helical translations or rotations was used. Since this electron density map showed no retinal present, it is not clear whether this form of rhodopsin is active or inactive. This same information has been used to build structures of
10 other GPCR classes (Vaidehi et al., 2002
). In each case the predictions of binding site and binding energy agrees well with available experimental data, providing some validation for this general approach of constructing the TM bundle of GPCRs. However, for bovine rhodopsin we can make much more precise comparisons to the experimental structures, as reported below.
Step 2b: Optimization of the relative translation of the helices in the bundle
The translational and rotational orientation of each helix in the TM bundle is critical to the nature and conformation of the binding site in the GPCR. We do not use homology methods to predict these quantities because many GPCRs have very remote sequence homology to rhodopsin (ranging down to 10%) making it quite risky to base a three-dimensional structure on homology modeling using the rhodopsin crystal structure as template. Also we do not use atomistic molecular dynamics and molecular mechanics methods to optimize the structure, because the large barriers between various favorable positions can trap the conformation in local minima making such approaches ineffective in repositioning the helices. Instead, we developed methods to optimize the initial packing by translating and rotating the helices over a grid of positions and by using various properties of the amino acids in the sequence to suggest initial starting points. This Monte Carlo-like systematic conformational search algorithm for rotational and translational orientation of the helices allows the system to surmount barriers in the conformational space.
Our general principle in repositioning the helices is that the outer surface of the TM bundle (at least the middle regions) should be hydrophobic to have stabilizing interactions with the hydrophobic chains of the lipid. We imagine a midpoint plane through the lipid bilayer corresponding to the contact of the hydrophobic chains, which we denote as the lipid midpoint plane (LMP). We then assume that the hydrophobic regions of the TM bundle will position themselves such that the middle of their maximum hydrophobicity lies in this plane. We tested this concept for the crystal structure of bovine rhodopsin as follows. We determined the hydrophobic center (HC) for each helix as the maximum of the peak of hydrophobicity from the profiles generated with various window sizes (since we go an integer number of residues in each direction, window size is always even). Our criterion for the best-fit to experiment is that these seven positions when applied to the crystal structure would all lie in a single plane that could be taken as the LMP.
As shown in Fig. 3, the deviation of the calculated hydrophobic centers from lying in a single plane in the rhodopsin crystal structure is a minimum for WS 20 and 22. Thus Get_Centers calculates the overall hydrophobic center of each TM helix based on the average of centers obtained for a range of window sizes near 20. Get_Centers determines this range of window sizes as follows. First, each hydrophobic center (HC) is calculated for WS = 20. Then, the HCs are calculated for WS 1230 (excluding WS = 20). For each helix Get_Centers determines the window sizes that yield HC less than five residues from the HC calculated at WS = 20. For example, consider helix 1 in Table 1. Here HC = 18 for WS = 20. For windows sizes 12, 14, 16, 18, 22, 24, 26, 28, and 30 we find HC = 15, 13, 20, 18, 17, 18, 15, 16, and 13. For WS 16, 18, 22, and 24 the HC are less than five from the value at WS = 20. Thus we consider that the hydrophobic center calculation is stable within this regime of window sizes. The HC calculated for WS 16, 18, 22, and 24 for the helices 27 are also less than five residues from the centers at WS = 20. Thus, Get_Centers averages the HC for window sizes 16, 18, 22, and 24 and then it averages these values with the HC at WS = 20 for each TM helix. Get_Centers takes these values (last column of Table 1) as the final TM helix centers. We find that for bovine rhodopsin, these seven HCs deviate by a root mean-square of 1.04 Å from a common plane.
|
|
vectors of the two adjacent helices. The calculation of the hydrophobic moment vector is restricted to this face angle. This allows the predicted hydrophobic moment to be insensitive to cases in which the interior of the helix is uncharacteristically hydrophilic (because of ligand or water interactions within the bundle). Currently we choose HMR to be the middle 15 residues of each helix straddling the predicted hydrophobic center and exhibiting large hydrophobicity. This hydrophobic moment is projected onto the common helical plane (LMP) and oriented exactly opposite to the direction toward the geometric center of the TM barrel (GCB). This criterion is most appropriate for the six helices (excluding TM3) having significant contacts with the lipid membrane. The LMP is the plane that most closely intersects the hydrophobic centers as described in Step 2b. The GCB is calculated as the center of mass of the positions of the
-carbons for each residue in the HMR for each helix summed over all seven. This procedure is called phobic orientation. For bovine rhodopsin, we used phobic orientation for placing the hydrophobic moments away from the GCB for all seven helices. Subsequently rotations were optimized using RotMin for helices 3 and 5 using small rotation angles of ±2.5°, ±5.0°, and ±8.0°. This optimizes the only salt bridge in the TM region (between residues His-211 and Glu-122). Coarse-grain rotation optimization combining both the energy optimization and hydrophobic moments is expected to provide better optimized TM helices than either one alone.
Step 3: Optimizing the individual helices
The optimization of the rotational and translational orientation of the helices described in the above steps is performed initially on canonical helices (we also apply them again to the helices after their optimizations described in Step 3). To obtain a valid description of the backbone conformation for each residue in the helix, including the opportunity of G, P, and charged residues to cause a break in a helix, the helices built from the Step 2 were optimized separately. In this procedure, we first use SCWRL for side-chain placement, then carry out molecular dynamics (MD) (either Cartesian or torsional MD called NEIMO; Jain et al., 1993
; Mathiowetz et al., 1994
; Vaidehi et al., 1996
) simulations at 300 K for 500 ps, then choose the structure with the lowest total potential energy in the last 250 ps and minimize it using conjugate gradients.
This optimization step is important to correctly predict the bends and distortions that occur in the helix due to helix breakers such as proline and the two glycines. The MD also carries out an initial optimization of the side-chain conformations, which is later further optimized within the bundle using Monte Carlo side-chain replacement methods. This procedure allows each helix to optimize in the field due to the other helices in the optimized TM bundle from Step 2.
Step 4: Addition of lipid bilayer and fine-grain reoptimization of the TM bundle
To the final structure from Step 3 MembStruk adds two layers of explicit lipid bilayers. This consists of 52 molecules of dilauroylphosphatidylcholine lipid around the TM bundle of seven helices. This was done by inserting the TM bundle into a layer of optimized bilayer molecules in which a hole was built for the helix assembly and eliminating lipids with bad contacts (atoms closer than 10 Å). Then we used the quaternion-based rigid-body molecular dynamics (RB-MD) in MPSim (Lim et al., 1997
) to carry out RB-MD for 50 ps (or until the potential and kinetic energies of the system stabilized). In this RB-MD step the helices and the lipid bilayer molecules were treated as rigid bodies and we used 1-fs time steps at 300 K. This RB-MD step is important to optimize the positions of the lipid molecules with respect to the TM bundle and to optimize the vertical helical translations, relative helical angles, and rotations of the individual helices in explicit lipid bilayers.
Step 5: loop building
Following the RB-MD, we added loops to the helices using the WHATIF software (Vriend, 1990
). After the addition of loops, we used SCWRL (Bower et al., 1997
) to add the side chains for all the residues. The loop conformations were optimized by conjugate gradient minimization of the loop conformations while keeping the TM helices fixed. This step also allows the general option of forming selected disulfide linkages (e.g., between the cysteines in the EC-II loop, which are conserved across many GPCRs, and the N-terminal edge of TM3 or EC3). In the case of bovine rhodopsin, the alignment of the 44 sequences from Step 1, Sequence Alignment, indicates only one pair of fully conserved cysteines on the same side of the membrane (extracellular side). The disulfide bond was formed and optimized with equilibrium distances lowered in decrements of 2 Å until the bond distance was itself 2 Å. Then the loop was optimized with the default equilibrium disulfide bond distance of 2.07 Å. Annealing MD was then used to optimize the EC-II loop at this stage. This involved 71 cycles, in each of which the loop atoms were heated from 50 K to 600 K and back to 50 K over a period of 4.6 ps. During this process the rest of the atoms were kept fixed for the first 330 ps and then the side chains within the cavity of the protein in the vicinity of the EC-II loop were allowed to move for 100 ps to allow accommodation of the loop. Subsequently a full-atom conjugate gradient minimization of the protein was performed in vacuum using MPSim (Lim et al., 1997
). This leads to the final MembStruk-predicted structure for bovine rhodopsin.
The crystal structure for the retinal/rhodopsin complex has a well-defined ß-sheet structure for EC-II, which we speculate to be involved as a mobile gate for entry of 11-cis-retinal on the extracellular side of rhodopsin. Such a gating mechanism is illustrated in Fig. 4, in which the helix 3 coupled to this loop by a cysteine bond is the gatekeeper which responds to signaling structural substates of rhodopsin as follows:
|
Thus we have built two structures for bovine rhodopsin (here, MS denotes that the structure was predicted using MembStruk): Apo/closed(MS) has the cysteine coupling observed in the crystal and is the structure we compare to experiment after binding the retinal; and Apo/open(MS) is built without a constraint, forming what we believe would be the configuration which binds initially to the ligand.
Function prediction for GPCRs
Since there are no experimental structures available for any human GPCR, the only validation available for the accuracy of predicted structures for human GPCRs is to predict the ligand binding sites and the ligand binding energies. The accuracy in the predicted binding site can then be judged from site-directed mutagenesis experiments on the residues predicted to control selectivity. An even tougher test is to compare binding affinity of ligands to each other and to mutated proteins. For many GPCRs of pharmaceutical interest there is ample experimental data on ligand binding constants as well as agonist and antagonist inhibition constants for many GPCRs (for a compilation of this literature, see http://www.gpcr.org).
To carry out such function validations for the predicted structures, it is essential to have reliable and efficient procedures for predicting binding site and binding affinities. Since the ligand binding site is completely unknown for most GPCRs, we must scan the entire protein to identify likely binding sites and conformation of each ligand, and then we must reliably rank the relative binding energies of the various ligands in these sites. To do this we employ the HierDock procedure, which has been tested and validated for predicting ligand binding sites and ligand binding energies for many globular and membrane-bound proteins (Vaidehi et al., 2002
; Kekenes-Huskey et al., 2003
; Floriano et al., 2004b
; Datta et al., 2003
, 2002
). These studies show that the multistep hierarchical procedure in HierDock ranging from coarse-grain docking to fine-grain MD optimization leads to efficient and accurate predictions for ligand binding in proteins.
The HierDock method was first described in Floriano et al. (2000)
, which we label as HierDock1.0. The method was improved in Vaidehi et al. (2002)
, which we label as HierDock2.0. In this article we present an improved version that we label as HierDock2.5. The various steps involved in this current procedure are as follows:
![]() | (1) |
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Ret(xtal)/closed(xtal) is obtained from the crystal structure by minimizing using the DREIDING FF. It deviates from the crystal structure by 0.29 Å CRMS. It has retinal bound as in the crystal structure and has the closed form of the EC-II loop. The retinal conformations differ by 0.22 Å CRMS. This further validates the FF. Since they differ so little, the retinal in the nonminimized crystal structure, Ret(xtal-noFF), is used as the reference structure for the HierDock validation step.
Apo/closed(xtal) is obtained from Ret(xtal)/closed(xtal) by removing the retinal and adding the proton to Lys-296. It was minimized without ligand. It deviates from the crystal structure by 0.74 Å CRMS. It is likely that this is a lower bound on the change in structure upon removal of the retinal. For a more complete optimization, we would use MD.
Ret(HD)/closed(xtal) is the predicted structure for 11-cis-retinal obtained by applying HierDock to Apo/closed(xtal) and then forming the Schiff base linkage to Lys-296 and minimizing. The Ret(HD) deviates from Ret(xtal) by 0.62 Å CRMS. To distinguish the error in ligand conformation due to the HierDock procedure from that due to MembStruk, the structure Ret(HD)/closed(xtal) will serve as the reference structure to compare to the predicted ligand conformations in the MembStruk structures.
Apo/closed(MS) is the MembStruk-predicted structure of the closed form, without the retinal. The TM bundle for this structure deviates by 2.84 Å CRMS main-chain atoms from Apo/closed(xtal) (4.04 Å CRMS for all TM atoms, excluding H).
Ret(HD)/closed(MS) is the predicted structure for 11-cis-retinal in the Apo/closed(MS) rhodopsin structure, obtained by applying HierDock to Apo/closed(MS) and then forming the Schiff base linkage to Lys-296 and minimizing the energy. The Ret(HD) deviates from Ret(HD)/closed(xtal) by 2.92 Å CRMS.
Apo/open(MS) is the MembStruk-predicted structure of bovine rhodopsin without the retinal. There are no experiments with which to compare. This structure differs in the TM region from Apo/closed(MS) by 0.11 Å.
Ret(HD)/open(MS) is the predicted structure for 11-cis-retinal in rhodopsin obtained by applying HierDock to Apo/open(MS) and then forming the Schiff base linkage to Lys-296 and minimizing. There are no experiments with which to compare. The retinal differs from that in Ret(HD)/closed(MS) by 1.74 Å.
Validation for function prediction HierDock protocol for 11-cis-retinal on bovine rhodopsin
Bovine rhodopsin (a member of the opsin family) is the only GPCR to be crystallized in its entirety at a high resolution (2.8 Å). Thus we used this system as a test to validate the HierDock protocol for predicting the binding sites of GPCRs.
To test HierDock, we used the Apo/closed(xtal) structure with the retinal removed and minimized. First we did a complete HierDock scan as outlined above to predict the binding of 11-cis-retinal to bovine rhodopsin. The crystal structure of rhodopsin has the 11-cis-retinal covalently bound to Lys-296 (between the aldehyde of 11-cis-retinal and the N of the Lys), but for docking we cannot have a covalent bond to the crystal. Thus we docked the full 11-cis-retinal ligand (containing a full aldehyde group) and considered the Lys-296 to be protonated.
We applied Steps 12 of the HierDock described above for all 13 overlapping regions for Step 2 shown in Fig. 5. The initial scan of the entire rhodopsin (Steps 1 and 2 in Function Prediction for GPCRs) gave two good binding regions shown as the red boxes in Fig. 5. The data for this scanning step are shown in Table 2. The final optimized best binding structure for the retinal/rhodopsin complex from Step 6 of HierDock deviates by 1.11 Å CRMS from the ligand in the crystal structure as seen in Fig. 6, A and B. The binding site (defined as the seven residues that contribute at least 1 kcal/mol to the bonding) of this ligand is shown in Fig. 9 B. Lys-296 has hydrophilic interactions whereas the other side chains have van der Waals interactions. This docked structure has the retinal O 2.72 Å from the N of Lys-296. In addition, the retinal O and the closest H of the protonated Lys-296 N are just 2.35 Å apart, close enough to form an H-bond (likely an intermediate step before Schiff base formation). We then coupled these two units to form the covalent CN bond to Lys-296 while eliminating the H2O. After minimizing the full ligand-protein structure, we find that the predicted structure for 11-cis-retinal bonded to the protein deviates from the crystal structure by only 0.62 Å CRMS as shown in Fig. 6, C and D. Most of this discrepancy results because the FF-minimized structure of the retinal has the ionone ring in a chair conformation which was retained in our docking procedure, whereas the crystal structure has the ionone ring in a half-chair conformation (which we calculate to be 2 kcal/mol higher in energy than the chair conformation within the minimized complex). This retinal/protein complex minimized with the DREIDING FF, denoted Ret(HD)/closed(xtal), serves as the reference structure for comparing the predicted structures in later sections. We consider that these results validate the HierDock protocol for a GPCR.
|
|
|
31 kcal/mol, a difference of 32 kcal/mol. This compares well with the experimental result that the retinal ligand/protein complex stores 34.7 ± 2.2 kcal/mol upon isomerization in the protein (Okada et al., 2001
Structure prediction of rhodopsin using MembStruk
We used MembStruk3.5 as detailed in The MembStruk Protocol for Predicting Structure of GPCRs to predict the structure of bovine rhodopsin using only the protein sequence. For the apo-rhodopsin we predicted two structures, one with the open EC-II loop and one with closed EC-II loop. These represent two different states of rhodopsin likely to play a role in activation of G-protein. The crystal structure of rhodopsin has a closed EC-II loop with the 11-cis-retinal bound to it. To validate this predicted structure, we should compare to the crystal structure for apo-rhodopsin (without a bound 11-cis-retinal). However, this crystal structure for the apo protein is not available. Thus instead we will compare the predicted structure to the minimized crystal structure of bovine rhodopsin after removing the 11-cis-retinal. In making these comparisons, we predicted two structures for apo-rhodopsin: 1), the open form, where no restrictions were made on the structure of EC-II loop, i.e., Apo/open(MS); and 2), the closed form, where we assumed that EC-II makes the same cysteine linkage as observed in the crystal structure, i.e., Apo/closed(MS).
The predicted TM domains are compared to the rhodopsin crystal structure in Fig. 2 and discussed in Step 1, Helix Capping in TM2ndS.
After optimization of the helices using MD (300 K for 500 ps), most helices yield the same bends as in the crystal. Thus helices 2 and 6 undergo significant bending (due to Pro-267 in helix 6 and due to Gly-89 and Gly-90 in helix 2), which is consistent with spin-labeling electron paramagnetic resonance experiments (Farrens et al., 1996
). In addition, we find that helix 7 bends near the two prolines, which has also been shown by spin-labeling experiments (Altenbach et al., 2001a
,b
). We find that helix 1 undergoes significant bending due to a Gly/Pro combination, but this has not yet been studied experimentally. Such bending at hinge sites may be important for expanding the molecular surface needed at the cytoplasmic side to allow G-protein binding. We find similar hinge-bending with MD when the trans-isomer is bound to the helix assembly.
After assembling the optimized helices again into a bundle, we carried out RotMin on helices 3 and 5, the only helix pair with a potential salt bridge. The resulting seven-helix bundle was then inserted into a lipid bilayer, and optimized using rigid-body molecular dynamics as described in Step 4 of The MembStruk Protocol for Predicting Structure of GPCRs. This step leads to optimization of the vertical helical positions, relative helical angles, and rotations of the individual helices within a lipid environment. The CRMS difference before and after this rigid body MD is 1.10 Å for all atoms and 0.98 Å for main-chain atoms. This is consistent with the changes during this optimization step for other GPCRs (Vaidehi et al., 2002
).
After adding the intracellular and extracellular loops, optimizing the side chains, and then optimizing the structure in vacuum with the TM helical region fixed (to eliminate bad contacts in the loop region), we then optimized the entire structure allowing all bonds and angles to change. These ab initio predictions of the structure were carried out for both the open and closed forms of the EC-II loop in apo-rhodopsin leading to the Apo/open(MS) and Apo/closed(MS) structures, where MS denotes a MembStruk-derived structure, and open or closed denotes the open or closed form of the EC-II loop. Although the crystal structure has the 11-cis-retinal bound, we will compare the predicted apo-rhodopsin structures to the minimized apo-protein of the crystal structure, Apo/closed(xray).
Comparing Apo/closed(MS) to Apo/closed(xray) we find a CRMS difference of 2.85 Å in the main-chain atoms and 4.04 Å for all the atoms in the TM helical region. These structures are compared graphically in Fig. 7 A. Comparing all residues including loops (but ignoring the residues not present or complete in the x-ray structure), the predicted structure differs from the crystal structure by 6.80 Å in the main chain and 7.80 Å CRMS for all atoms. The major contribution to this CRMS is the low-resolution loop region, which is likely to be quite fluxional and may be very different between crystal and solution. Specifically, the predicted topology and
-
angles of the EC-II loop are consistent with that of a ß-sheet. However, the specific twist of this ß-sheet in the x-ray structure was not predicted well. Although this may be partly due to packing effects in the crystal structure, we consider that our prediction of the general topology of the EC-II loop to act as a "plug" to restrict retinal binding is adequate but that specific interactions with retinal may not be predicted well. In the function prediction results discussed below in the subsection called Apo/Closed(MS), we find that there are no specific favorable interactions between the ligand and the EC-II loop before Schiff base bond formation in the crystal structure (Fig. 9 B). Thus the EC-II may function initially primarily as an unspecific "plug" to disfavor certain ligand conformations. After Schiff base bond formation, the ligand is then stabilized by Glu-181 in the EC-II loop (Fig. 10 A). Thus accurate prediction of the atomic structure of the EC-II loop remains an important challenge.
|
|