help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dima, R. I.
Right arrow Articles by Thirumalai, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dima, R. I.
Right arrow Articles by Thirumalai, D.

Biophys J, September 2002, p. 1268-1280, Vol. 83, No. 3

Exploring the Propensities of Helices in PrPC to Form beta  Sheet Using NMR Structures and Sequence Alignments

R. I. Dima and D. Thirumalai

Institute for Physical Science and Technology, and Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742 USA


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSIONS
REFERENCES

Neurodegenerative diseases induced by transmissible spongiform encephalopathies are associated with prions. The most spectacular event in the formation of the infectious scrapie form, referred to as PrPSc, is the conformational change from the predominantly alpha -helical conformation of PrPC to the PrPSc state that is rich in beta -sheet content. Using sequence alignments and structural analysis of the available nuclear magnetic resonance structures of PrPC, we explore the propensities of helices in PrPC to be in a beta -strand conformation. Comparison of a number of structural characteristics (such as solvent accessible area, distribution of (Phi , Psi ) angles, mismatches in hydrogen bonds, nature of residues in local and nonlocal contacts, distribution of regular densities of amino acids, clustering of hydrophobic and hydrophilic residues in helices) between PrPC structures and a databank of "normal" proteins shows that the most unusual features are found in helix 2 (H2) (residues 172-194) followed by helix 1 (H1) (residues 144-153). In particular, the C-terminal residues in H2 are frustrated in their helical state. The databank of normal proteins consists of 58 helical proteins, 36 alpha +beta proteins, and 31 beta -sheet proteins. Our conclusions are also substantiated by gapless threading calculations that show that the normalized Z-scores of prion proteins are similar to those of other alpha +beta proteins with low helical content. Application of the recently introduced notion of discordance, namely, incompatibility of the predicted and observed secondary structures, also points to the frustration of H2 not only in the wild type but also in mutants of human PrPC. This suggests that the instability of PrPC proteins may play a role in their being susceptible to the profound conformational change. Our analysis shows that, in addition to the previously proposed role for the segment (90-120) and possibly H1, the C-terminus of H2 and possibly N-terminus may play a role in the alpha right-arrowbeta transition. An implication of our results is that the ease of polymerization depends on the unfolding rate of the monomer. Sequence alignments show that helices in avian prion proteins (chicken, duck, crane) are better accommodated in a helical state, which might explain the absence of PrPSc formation over finite time scales in these species. From this analysis, we predict that correlated mutations that reduce the frustration in the second half of helix 2 in mammalian prion proteins could inhibit the formation of PrPSc.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSIONS
REFERENCES

Prions are infectious particles that are an abnormal isoform, PrPSc, of the normal host-encoded cellular prion protein PrPC (Prusiner, 1997; Prusiner, 1998; Cohen, 1999). They are believed to be associated with neurodegenerative diseases in humans and many mammals, which are caused by transmissible spongiform encephalopathies (TSE). Examples of TSEs are familial Creutzfeldt-Jakob disease (CJD) in humans, scrapie of sheep, and bovine spongiform encephalopathy (BSE). Because no nucleic acid is implicated in the transformation from PrPC to PrPSc, the "protein-only" hypothesis was proposed (Prusiner, 1997; Prusiner, 1998; Cohen, 1999), which argues that the same amino-acid sequence adopts two distinct structures. A wealth of experimental data on mammalian prions supports this hypothesis (Cohen and Prusiner, 1998). The protein-only hypothesis has also been demonstrated in recent studies on Saccharomyces cerevisae by Weissman and coworkers (Sparrer et al., 2000). They showed that introduction of preconverted Sup35 leads to the formation of self-propagating [PSI+], which apparently has prion-like characteristics.

The protein-only hypothesis implies that the conformational change leading to the PrPSc formation from the normal cellular form PrPC may be spontaneous or might involve interactions with unidentified protein X (Telling et al., 1995). Although the mechanism of conversion of PrPC into PrPSc and the tertiary structure of PrPSc are not known, numerous studies (Pan et al., 1993; Pergami et al., 1996) have shown that the two isoforms have identical amino-acid sequences. However, the structures of PrPC and PrPSc are completely different. Nuclear magnetic resonance (NMR) structures of several mammalian prion proteins show that PrPC(121-231) is ~45% helical with a very low (3-8%) beta -sheet content (Riek et al., 1996, 1997; James et al., 1997; Donne et al., 1997; Zahn et al., 2000). Using Fourier transform infrared spectroscopy, it has been inferred that the secondary structure of PrPSc (formed from PrPC(90-231)) has ~50% beta -sheet content (Caughey et al., 1991; Gasset et al., 1993). An extraordinary conformational change takes place in the transition from the normal state to the infectious form. Determination of the NMR structures of PrPC for a number of species (syrian hamster, mouse, bovine, and human) is significant because it potentially offers hope of understanding, at the molecular level, the mechanism of PrPCright-arrowPrPSc conversion.

Prion proteins, encoded by a single gene, consist of about 250 residues of which the first 22 form a signal sequence. This is followed by unstructured, but likely helical, Cu2+ binding octarepeats rich in glycine (Prusiner, 1998). The NMR structure of the remaining protein PrP(90-231) shows that the N-terminus is disordered (Donne et al., 1997). The most ordered portion of the structure consists of ~103 or 104 residues, PrPC(121-231). The structures from this region from three species (mouse, hamster, and human) show that PrPC consists of three helices and two short beta -strands (Riek et al., 1996, 1997; James et al., 1997; Donne et al., 1997; Zahn et al., 2000). Using the mouse prion protein as the reference (Fig. 1), the three helices H1, H2, and H3 span residues 144-153, 172-194, and 200-224, respectively. The NMR structure (Fig. 1) shows that there are two small anti-parallel beta -strands (residues 128-131 and 161-164). The contact map for the mPrP(121-231) structure (Fig. 2) shows that PrPC has the characteristics of an alpha +beta protein.



View larger version (31K):
[in this window]
[in a new window]
 
FIGURE 1   Ribbon diagram of mPrP(121-231) (produced with the program MolScript). The sequence with the secondary structure assignments (given below) are taken from the header of the PDB file. The sequence numbering begins with Gly at 124 and ends with Tyr at 226. The amino acids are given in the 1-letter format. The two anti-parallel beta -strands are between residues 128-131 and 161-164.



View larger version (20K):
[in this window]
[in a new window]
 
FIGURE 2   Contact map for the mPrP structure shown in Fig. 1. Two residues are assumed to be in contact if any of their heavy atoms are within 5.2 Å. In the legends, h stands for helices, s for strands, and l for loops. The contact map shows that H1 does not interact with H2. From the sequence (Fig. 1) and the contact map, it is clear that the few interactions between H1 and H3 are relatively destabilizing. In contrast, H2 and H3 make many stabilizing contacts with each other.

We selected (from the Protein Data Bank (PDB)) four prion proteins with known tertiary structures: 1ag2 (mouse prion protein, 103 residues), 1b10 (syrian hamster prion protein, 104 residues), 1qlz (human prion protein, 104 residues, the first of 20 NMR models), 1qlx (human prion protein, 104 residues, representative). According to Structural Classification of Proteins (SCOP) (Murzin et al., 1995), they belong to the alpha +beta class of proteins, and, according to Homology Derived Secondary Structure of Proteins (Sander and Schneider, 1991), they are very similar because of 90% sequence identity between mPrP and h1PrP, an 86% sequence identity between shPrP and h1PrP, and a 94% sequence identity between mPrP and shPrP. There is some variability in the precise assignments of residues in the helices and the strands, which depends on whether NMR structures correspond to (90-231) or (121-231). The location of secondary structural elements also differs depending on the refinements used. In this study, we use the classification provided in the header of the PDB (Berman et al., 2000) file.

The purpose of our study is to identify those features of PrP that might play an important role in its conversion to PrPSc using sequence alignments and NMR structures. To pinpoint the plausible regions of the PrPC structures that are susceptible to conformational fluctuations, we compare a number of characteristics of the structural motifs in PrPC with those of "normal" proteins. We determined, from a databank of proteins from the PDB, the degree of solvent accessibility and the typical (phi psi ) angles for the 20 types of amino acids in alpha -helices and in beta -sheets. Comparison between the typical values and those for the amino acids of the sequences of PrPC revealed that some positions in the helices of prion proteins exhibit characteristics that are distinct from what is normally observed. Distribution of contacts in PrPC shows that they differ from normal proteins especially in regard to the short-range contacts. Most of these differences are localized in helix 2 (H2). Our results suggest that the C-terminus of H2 is frustrated in the experimentally found alpha -helical state. Because the bulk of the stability of PrPC comes from the core, our analysis suggests that nearly the whole protein might be involved in the transition to PrPC* (a species that can nucleate and polymerize). This implies that there is a large barrier that separates PrPC* from PrPC.


    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSIONS
REFERENCES

Choice of databanks

The four NMR structures of PrPC analyzed are the mouse mPrP (103 residues, 1ag2), syrian hamster shPrP (104 residues, 1b10), human h1PrP (104 residues, 1qlz), and human h2PrP (104 residues, 1qlx). The PDB entries follow the lengths of the proteins. In the case of the entry h1PrP, we used the first of 20 NMR structures, and a representative structure was chosen for the entry h2PrP.

Because we are interested in comparing the prion proteins with normal proteins (for which no alternative conformations leading to fibril formation has been detected), we assembled databanks of mainly alpha , mainly beta  and alpha +beta proteins (based on the SCOP classification) from the PDB. From the resulting sets, we kept only those proteins that satisfied the following criteria: 1) the experimental method for determining the three-dimensional (3D) structure of the protein was x-ray crystallography; 2) the protein is single-chain; 3) there are not many (more than 2) missing residues from the protein or many missing atoms from the amino acids of the protein; and 4) the experimental information about the secondary structure of the protein appears in the PDB header.

With these criteria, we were left with 58 mainly alpha  proteins whose PDB entries are 1a32(OB), 1a43(OB), 1aa2(OB), 1aep, 1ail(OB), 1aum(OB), 1axn, 1ayi(OB), 1bbc, 1bd8, 1bea(OB), 1beo(OB), 1bfa(OB), 1bgc, 1bgd, 1bgf(OB), 1cei(OB), 1cem, 1csh, 1eca(OB), 1ecd(OB), 1fps(OB), 1gcb, 1hbg(OB), 1hik, 1huw, 1hyp(OB), 1hyt, 1ilk, 1ith, 1lh1(OB), 1lis, 1lki, 1lpe, 1lrv, 1mba(OB), 1mzl(OB), 1nfn, 1pbv, 1poa, 1r69(OB), 1rcb, 1utg(OB), 1vlk, 1vls, 2abk, 2asr, 2bct, 2cro(OB), 2end(OB), 2fha, 2int, 2lhb(OB), 2mhr, 451c(OB), 4cpv(OB), 4icb(OB), 5cyt(OB). The OB indicates that the architecture of the protein is orthogonal bundle according to Orengo et al. (1997).

The 36 normal alpha +beta proteins in our comparison set have the PDB entries 119l, 1ah6, 1ako, 1cby, 1cof, 1ctf, 1fkb, 1gcb, 1han, 1kuh, 1lba, 1lbu, 1mrj, 1npk, 1opd, 1pgx, 1pkp, 1ptf, 1sap, 1snc, 1tif, 1uae, 1ubi, 1vcc, 1vhh, 2aak, 2hpr, 2phy, 2sn3, 3cla, 3lzm, 3ssi, 4bp2, 5pti, 7rsa, 9rnt.

The PDB entries of 31 mainly beta  proteins in our databank are 1aac, 1amm, 1aol, 1bfg, 1eur, 1fna, 1gpc, 1gpr, 1hoe, 1idk, 1jpc, 1knb, 1kum, 1lcl, 1mai, 1nif, 1pdr, 1 pmi, 1srl, 1tul, 1wba, 1whi, 1who, 1xnb, 2ayh, 2cba, 2cna, 2cpl, 2mcm, 2prd, 2sns.

Structural analysis

Using these three databanks, we determined some of the attributes, namely, the degree of solvent exposure and the (phi psi ) angle combination, of the typical amino acid (for each of the 20 types of amino acids) in helices and in strands. Because the N-cap and C-cap of helices present special features compared to all other positions in a helix (see, for example, the study of Aurora and Rose, 1998, and references therein) and we were interested in obtaining the information about a typical residue in a helix, we treated these two positions as belonging to the loop category. We also treated helices shorter than six residues as loops.

Regular density

To decipher residue-residue interactions Baud and Karlin (1999) introduced several density measures that give an indication of the environments. Following these authors, we calculated the regular density for the nine structural classes formed from the three secondary structures (alpha  helix, beta -strands, and loops) and the degree of solvent exposure (buried (b), partially buried (pb), and exposed (e)). The regular density for an amino acid i (i = 1, 2, ... , 20) in an environmental class lambda  (lambda  = 1, 2, ... , 9) is (Baud and Karlin, 1999)
<A><AC>&rgr;</AC><AC>&cjs1171;</AC></A><SUP><UP>&lgr;</UP></SUP><SUB><UP>i</UP></SUB>=<FR><NU>1</NU><DE>N<SUB><UP>T</UP></SUB></DE></FR> <LIM><OP>∑</OP><LL><UP>&agr;=1</UP></LL><UL><UP>N<SUB>T</SUB></UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j</UP></LL></LIM> &THgr;(T−d<SUP><UP>&agr;</UP></SUP><SUB><UP>m,ij</UP></SUB>), (1)
where d<UP><SUB>m,ij</SUB><SUP>&agr;</SUP></UP> is the minimum distance between two heavy atoms in a residue pair i and j, alpha  is the structure, T is the threshold value (= 5 Å), Theta (x) is the Heaviside function, and NT is the number of structures in the dataset. We calculated the distribution of <A><AC>&rgr;</AC><AC>&cjs1171;</AC></A><UP><SUB>i</SUB><SUP>&lgr;</SUP></UP> and the associated dispersion delta <A><AC>&rgr;</AC><AC>&cjs1171;</AC></A><UP><SUB>i</SUB><SUP>&lgr;</SUP></UP>. A mismatch for an amino acid in a class (hydrophobic (H), polar (P), positively charged (+), negatively charged (-)) is measured in terms of the variable
&Dgr;&rgr;<SUP><UP>&lgr;,&agr;</UP></SUP><SUB><UP>i</UP></SUB>=<FR><NU>&rgr;<SUP><UP>&lgr;,&agr;</UP></SUP><SUB><UP>i</UP></SUB>−<A><AC>&rgr;</AC><AC>&cjs1171;</AC></A><SUP><UP>&lgr;</UP></SUP><SUB><UP>i</UP></SUB></NU><DE>&dgr;<A><AC>&rgr;</AC><AC>&cjs1171;</AC></A><SUP><UP>&lgr;</UP></SUP><SUB><UP>i</UP></SUB></DE></FR>, (2)
where rho <UP><SUB>i</SUB><SUP>&lgr;,&agr;</SUP></UP> is the regular density in the structure alpha . If Delta rho <UP><SUB>i</SUB><SUP>&lgr;,&agr;</SUP></UP> lies outside the interval -<=  Delta rho <UP><SUB>I</SUB><SUP>&lgr;,&agr;</SUP></UP> <=  1, then there is significant deviation from normal behavior. If a sequence harbors many mismatches, then we consider it to be frustrated in that structural element. We used the following classification of the amino acids:
1) hydrophobic:   Cys, Phe, Leu, Trp, Val, Ile, Met, Tyr, Ala
2) polar:  Gly, Pro, Asn, Thr, Ser, Gln, His
3) charged:  Arg, Lys, Asp, Glu


    RESULTS
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSIONS
REFERENCES

Characteristics of normal proteins

To ascertain whether PrPC proteins have any unusual structural characteristics, we first computed several characteristics of proteins in our database. These serve as a basis to illustrate the unusual features, if any, of the prion proteins.

Sequence composition

An obvious property to consider is the overall amino-acid composition. This is relevant because a proteomic sequence analysis of 31 organisms (Michelitsch and Weissman, 2000) revealed a large content of glutamine/asparaginine residues in yeast prions. The unusually high content of Gln and Asn suggests that the regions containing these residues may trigger the initial nucleation in the transition to the beta -structures that leads to self-propagation in yeast prions (Michelitsch and Weissman, 2000). The composition of PrPC is closer to that of all beta  proteins or to those alpha +beta proteins that have a higher strand content than helices. In many alpha +beta proteins, Ala is a typical helix former (Jiang et al., 1998). However, in PrPC only H3 contains a single Ala, and none of the helices is amphipathic. In mPrP, H1 is made up of three hydrophobic, one polar and six charged residues and has 3.66 residues/turn, and H2 is made up of six hydrophobic, two charged and 15 polar residues and has 3.92 residues/turn. Helix 3 is made up of seven hydrophobic, eight charged and ten polar residues and has 5.06 residues/turn. Although prion proteins have slightly unusual composition compared to a typical alpha +beta protein, the differences are not dramatic.

Solvent exposure ratio for the amino acids in a helix or strand in normal proteins

To characterize the degree of solvent exposure of an amino acid in a protein, we used R = AS/AR where AS is the solvent-accessible area calculated using the Lee and Richards (1971) algorithm, and AR is the area of the same type (X) of amino acid in a Gly-X-Gly extended conformation. The values of AR were taken from the literature (Creighton, 1993). Each position, excluding the ends of the chain, in a protein in its folded conformation has R between 0 and 1. We computed R for the 20 amino acids from the helices of the 58 mainly alpha  proteins, and of the 36 alpha +beta proteins. These distributions obey the known percentages of buried positions (defined by R being less than 0.05) for each type of amino acid. For example, Val is buried in 55% of cases, whereas Lys is buried in a proportion of only 3% in the mainly alpha  proteins. The distribution of R values for the hydrophobic residues is peaked at small values of R, with negligible small peaks at larger values. For polar and charged amino acids, the distribution is much broader with a peak at larger values than the ones for the hydrophobic amino acids.

For hydrophobic amino acids, the typical value of R is identified with the median value, whereas, for polar and charged amino acids, the average of the distribution gives the typical ratio (the actual values are in the caption for Fig. 3). For Ala, Gly, and Met, none of these two classifications led to a reliable typical value.



View larger version (11K):
[in this window]
[in a new window]
 
FIGURE 3   (A) Distribution of R, the ratio of accessibility of surface area of a given amino acid to the area of the same amino acid in a Gly-X-Gly extended conformation, for Val. (B) Same distribution as in (A) for Arg. The mean (median) values R and the corresponding dispersion for the amino acids are: Cys (R = 0.016, tol = 0.100), Phe (R = 0.042 tol = 0.100), Leu (R = 0.019 tol = 0.100), Trp (R = 0.095 tol = 0.100), Val (R = 0.046 tol = 0.100), Ile (R = 0.036 tol = 0.100), His (R = 0.272 tol = 0.200), Tyr (R = 0.086 tol = 0.100), Pro (R = 0.404 tol = 0.253), Asn (R = 0.348 tol = 0.195), Thr (R = 0.262 tol = 0.232), Ser (R = 0.317 tol = 0.231), Arg (R = 0.394 tol = 0.203), Gln (R = 0.347 tol = 0.198), Asp (R = 0.371 tol = 0.232), Lys (R = 0.492 tol = 0.182), Glu (R = 0.405 tol = 0.211).

Distribution of the (phi psi ) angles for the typical amino acid in helix or strand

We computed the distribution of the (phi psi ) angles for each type of amino acid in the helices and strands from the three datasets described in Methods. The limits of the (phi psi ) pair for an amino acid (other than Gly and Pro) in an alpha  helix are -80 <=  phi  <=  -48, -59 <=  psi  <=  -27. The range for a beta  sheet is -150 <=  phi  <=  -90, 90 <=  psi  <=  150 (Srinivasan and Rose, 1995).

We divided the range for each of the two angles in a helix in intervals of 10°, which led to nine different classes of (phi , psi ) angles. We assigned to each amino acid a number between 1 and 9, based on its (phi psi ) angle. If the angles were outside the specified range of all classes, an arbitrarily large number (50) was assigned. We obtained the distribution of these numbers for the 20 amino acids in the helices of the 58 mainly alpha  proteins, and of the 36 alpha +beta proteins. The median value, computed using the distribution, was chosen to be the typical angle for that amino acid. This method of computing the typical angles is more meaningful than obtaining averages over the distribution of (phi psi ) angles. The typical (phi , psi ) angles obtained from the 58 all alpha  proteins using this procedure are: for Cys, Phe, Leu, Trp, Val, Ile, Met, His, Tyr, Asn, Thr, Arg, Gln, Asp, and Glu -70 < phi  <=  -59, -49 < psi  <=  -38, and for Ala, Ser, and Lys -70 < phi  <=  -59, -38 < psi  <=  -27. The corresponding values from the alpha +beta databank are very similar.

For each of the two angles in a strand, we divided the above limits in intervals of 10°, which led to 25 different classes of (phi psi ) angles. We attributed to each amino acid a number between 1 and 25, based on which class corresponded to its (phi psi ) angles. If the angles fell outside the range, a score of 50 was given. From the distribution of these numbers for the 20 types of amino acids in the strands of the 31 mainly alpha  proteins and of the 36 alpha +beta proteins, the median value was calculated. The typical angle for a given amino acid in these structures corresponds to the median value.

Structural mismatches in helices of PrPC

The distribution of R values and the angles adopted by amino acids in the helical and strand structures give a calibration of what is typically expected in the database of normal structures. A comparison of these values with those adopted by amino acids in the helices and strands of prion proteins gives hints of plausible structural anomalies.

If R (a measure of solvent accessibility) for an amino acid in PrPC is different from the typical values Ralpha  + delta Ralpha (the typical values and the values of the tolerance delta Ralpha are given in the caption of Fig. 3), then we consider it a mismatch. Similarly, if the (phi psi ) angle for an amino acid falls outside the typical values, it is a mismatch. Using these criteria, we find that between 13 and 16 amino acids are mismatches in the four prion proteins (16 out of 52 positions in helices in the mPrP, 16 out of 56 positions in helices in h1PrP, 13 out of 55 positions in helices in shPrP and 15 out of 56 positions in helices in h2PrP). The difference between R for each position and the typical value of R for the amino acid at that position in the helices of the four prion proteins are shown in Fig. 4. We repeated this analysis for all the helices in 58 proteins from the mainly alpha  databank and in the 36 proteins from the alpha +beta databank. The resulting histogram of all the percent mismatches in angles (that is, the number of mismatches Nmism divided by the length of the helix, Lhelix) shows that the prion proteins belong to the tail of the distribution (Fig. 5). A histogram for the mismatches in R also shows that the prion proteins belong to the tail of the distribution corresponding to values larger than the average of 20% (data not shown). An analysis of the amino acids in the beta  strands did not reveal any anomalies in prion proteins compared to the typical alpha +beta protein.



View larger version (30K):
[in this window]
[in a new window]
 
FIGURE 4   Mismatch values Delta i = Ri - <A><AC>R</AC><AC>&cjs1171;</AC></A>i where <A><AC>R</AC><AC>&cjs1171;</AC></A>i is the typical (mean or median) value (Fig. 3) and i is the position of the amino acid along the sequence. For all species, there are considerable deviations from the typical values even in the core of PrPC.



View larger version (11K):
[in this window]
[in a new window]
 
FIGURE 5   Distribution of the number of mismatches for amino acids in helices (normalized by helix length and expressed as percentages) for the (phi psi ) angle. The value of a (phi psi ) angle at a position is considered a mismatch if it is different from the typical value. The top panel shows the distribution for the amino acids in the helices of the alpha +beta proteins, the middle one corresponds to the proteins from the all alpha  databases, and the bottom panel corresponds to the four prion proteins. Prion proteins contain more mismatches than regular proteins that do not aggregate. Helices in PrPC have many more positions with (phi psi ) angles lying outside the range of a right-handed alpha  helix. This shows that there are local instabilities in the secondary structures of prion proteins. The corresponding distribution for R values shows no such dramatic differences between the prion proteins and other proteins.

Environment-dependent structural characteristics reveal unusual number of mismatches in PrPC

Baud and Karlin (1999) introduced nine structural categories to quantify the environmental propensities of amino acids in folded proteins. They are based on the three standard secondary structure states (helix, strand, and loop), and on three side-chain solvent-accessibility levels: As <=  10% for the buried state (b), 10% < As <=  40% for the partly buried state (pb), and As > 40% for the exposed state (e).

There are clear differences among the various types of amino acids in these nine structural classes in terms of number of neighbors and their identities (aromatic, hydrophobic, polar, positively, and negatively charged). Using this approach, we calculated the typical number of neighbors (within 5 Å) and the corresponding standard deviations for the amino acids in each of the nine structural classes from the database of proteins. We compared these reference results with the structural characteristics of amino acids in PrPC in terms of mismatches. The resulting histogram of such mismatches shows that there is a similarity between this analysis and the one in Fig. 5 (data not shown). There are significant deviations from the expected behavior of the regular density (defined in Methods) for amino acids exposed in the helices of PrPC. This is particularly dramatic for exposed hydrophobic residues (Fig. 6 A), buried negatively charged residues (data not shown) and exposed negatively charged residues (Fig. 6 B). These two figures suggest that, in PrPC, there are more mismatches between the structural preferences of amino acids and the actual structural environment that is found in normal proteins. Thus, both secondary-structure analysis and environmental analysis that reflects tertiary structure preferences suggest anomalies in PrPC.



View larger version (16K):
[in this window]
[in a new window]
 
FIGURE 6   (A) Distribution of the normalized deviations of the regular density (Eq. 2 in Methods) for solvent-exposed hydrophobic amino acids in PrPC that are in helices. For comparison, the results from helices in our dataset (i.e., from the all alpha  and the alpha +beta proteins) are also shown. The numbers in parenthesis correspond to the total number of residues in the given environmental class. For example, there are 18 exposed hydrophobic amino acids in the helices of the four prion proteins. (B) Same as in (A) except that it is for the solvent-exposed negatively charged residues.

Hydrogen bonds

The present analysis shows that amino acids at some sites in the PrPC present unusual properties compared to the average amino acid of the same type in normal proteins. This is exemplified in the degree of solvent accessibility, i.e., some polar amino acids (e.g., Glu-146 and Asp-147 in the first helix of mPrP), which are typically exposed, are buried in PrPC. It follows that the alpha -isoform of PrPC should have many unsatisfied buried hydrogen-bond donors/acceptors. To assess this, we examined the hydrogen-bond characteristics of normal proteins. The analysis of the all alpha  protein 1aa2 (108 residues) with the WHAT CHECK program (Hooft et al., 1996) reveals seven unsatisfied buried hydrogen-bond donors/acceptors, whereas, for the alpha +beta protein 1fkb (107 residues), the number is eight, and for the all beta  protein 1aac (105 residues), the corresponding number is eight. McDonald and Thornton (1994) showed that, in normal proteins, only a low percentage (~6%) of the total number of residues have unsatisfied buried hydrogen-bond donors/acceptors. The WHAT CHECK analysis for prion proteins shows 15 (14%) unsatisfied buried hydrogen-bond donors/acceptors in the mPrP (1ag2), also 15 (14%) in the shPrP (1b10), 6 (5.8%) and 9 (8.7%) in the human prion (h2PrP and h1PrP, respectively). We find that mPrP and shPrP have more than twice the usual amount of unsatisfied buried hydrogen-bond donors/acceptors, whereas the human prion protein behaves more like an average protein in this respect. There is good agreement between the identity of the sites revealed by this analysis and the problem sites identified by mismatches in R and the distribution of (phi psi ) angles. As an example, the WHAT CHECK analysis for mPrP reveals unsatisfied hydrogen bonds at positions 130, 139, 141, 142, 143, 145, 151, 155, 161, 166, 170, 174, 183, 187, and 219 (Fig. 4) and we found 130, 145, 151, 174, 183, 187, and 219 to have mismatches in terms of R and the (phi psi ) angles.

Local versus nonlocal contacts

We consider two residues to be in contact if they have at least two of the heavy atoms from their side-chain within 5.2 Å. Contacts among residues that are less than (greater than) 20% of the length of the protein along the sequence are classified as local (nonlocal). The tendency of PrPC to undergo alpha right-arrowbeta transition suggests that these structures may only be marginally stable (Cohen et al., 1994), and hence the distribution of contacts, which points to the intrinsic stability of proteins, is useful.

We calculated the number and type of the short- and long-range order contacts in prion proteins and in the proteins from the three databases. The results, shown in Table 1 and Fig. 7, show that, for a typical protein (whether all alpha , alpha +beta , or all beta ), the number of short-range (or local) contacts is about twice as large as the number of long-range (or nonlocal) contacts, whereas, in PrPC, the number of short-range contacts is approximately equal to that of the number of long-range contacts. More importantly, the nature of residues that are involved in local contacts in PrPC is strikingly different from the ones in normal proteins. In normal proteins the most probable local contacts are exclusively made up of hydrophobic residues, which is not the case in prion proteins (Table 1). The numbers of HP (hydrophobic (H) and polar (P)) and HH contacts are approximately equal in prion proteins, whereas, in normal proteins, there are more HH contacts than HP contacts. The percentages of local contacts of types +-, H-, and PP are much higher in PrPC than in normal proteins.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 1   Statistics of local and nonlocal contacts



View larger version (43K):
[in this window]
[in a new window]
 
FIGURE 7   Percentage of each of the possible types of local and nonlocal contacts in PrPC and in all the proteins from our database. Local contacts are defined as those that are within 20% of the length of the protein along the sequence. All other contacts are considered nonlocal. The 20 types of amino acids are divided into four classes: hydrophobic (H) (as above), polar (P) (as above), positively charged (+) (Arg and Lys), and negatively charged (-) (Asp and Glu). This leads to 10 types of possible interactions, which are indicated in the ordinate: 1 (H, H), 2 (H, P), 3 (H, +), 4 (H, -), 5 (P, P), 6 (P, +), 7 (P, -), 8 (+, +), 9 (+, -), and 10 (-,-).

To put the above striking observations about PrPC in perspective, it is necessary to assess whether similar characteristics can be found in normal proteins. We searched the dataset to find instances of local and nonlocal contacts involving amino acids that are not typically associated with each other. There are a few proteins with similar characteristics as PrPC. They are 1axn, 1mzl, 2asr, and 5cyt from the all alpha  database; 1fkb, 1vhh, 2cpl, and 9rnt from the alpha +beta database; and 1aac and 1hoe from the all beta  database. Most of these proteins have, at short-range, comparable numbers of HP and HH contacts and many PP contacts, just like PrPC. The all alpha  proteins also have many local H+ and P+ type of contacts, but none of them has as many +- and H- contacts as PrPC. Proteins that appear most similar to PrPC are 2cpl and 1hoe, but neither of them has such a large percentage of +- contacts as prion proteins. Note that all of the above-mentioned proteins from the alpha +beta class have a much lower helical content than PrPC: 1fkb has 1 helix, the rest being sheet, 1vhh is 19% helical and 16% sheet, 2cpl is 12% helical and 27% sheet and 9rnt has just one helix, the rest being sheet.

This analysis reveals some important characteristics of PrPC. They have an unusually large number of +- local contacts; They also have far from typical percentages of both local and nonlocal HP, H-, PP, P- contacts; and The percentages of local H- and P- contacts are similar to what is seen in proteins that have a much higher sheet content than prion proteins.

Nature of contacts in H1, H2, and H3 in PrPC

The unusual local and nonlocal contacts (+-, HP, H-, PP, P-) in PrPC are localized in the three helices. Because we are interested in regions of instability, we examined the nature of local and nonlocal contacts in H1, H2, and H3. The idea is to compare the pair of the most probable type of short-range and the most probable type of long-range contact in each helix from PrPC with the same pair from the helices of all other proteins. We define as short-range any contact between an amino acid of a helix with any other amino acid belonging to the same helix (including the N- and the C-cap of the helix), and as long-range the contacts with the amino acids outside the helix. We extracted, from the prion proteins and from all the proteins in the all alpha  and alpha +beta datasets, the pair of the most probable types of short and long-range contacts for every helix. We then counted how often each of the pairs seen in PrPC appears in the other proteins. The combined results, presented in Tables 2 and 3, suggest that H2 is the most different from average, followed by H1 and then by H3.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 2   Nature of the most probable local and nonlocal contacts


                              
View this table:
[in this window]
[in a new window]
 
TABLE 3   Joint probability of occurrence of various types of contacts

Contacts among H1, H2, and H3

The alpha right-arrowbeta structural transition in PrPC(90-231) leads to ~50% beta -sheet content in PrPSc, which suggests that a relatively large number of the residues in the alpha -helices must rearrange into a beta -strand conformation. Previous studies, based in part on the observation that H1 is unusually hydrophilic (Billeter et al., 1995; Morrisey and Shakhnovich, 1999), have already proposed that, in PrPSc, H1 is likely to play a role in the alpha  transition. Here, we find that the unusual nature of short- and long-range contacts in H2 makes it frustrated in the helical state. Therefore, it is interesting to analyze the contacts among the three helices to estimate the effects that a structural transition (alpha right-arrowbeta ) in a helix might have on the other helices. The arrangement of the helices in PrPC gives them an orthogonal bundle (OB) architecture (Orengo et al., 1997). Therefore, we selected 47 proteins with this type of architecture from the CATH database (Orengo et al., 1997). Of these, 27 are among the 58 all alpha  proteins from our dataset. We calculated the number of interhelical contacts in each of these 47 proteins and in the four prion proteins. Plot of the number of contacts formed by a helix versus its length and the percentage of stabilizing (that is, HH or +-) contacts among these (data not shown) shows that all three helices from PrPC are similar (in this respect) to helices from other proteins.

The analysis of the interhelical contacts reveals that there are many more (H-) interhelical contacts in prion proteins than in other proteins (most of which are due to the contacts between H1 and H3), and that the percentage of HH type of contacts in PrPC is somewhat increased (due to the contacts between H2 and H3). The largest majority of stabilizing (HH and +-) contacts occur between the amino acids of the first half of H2 (172-183) and those of the second half of H3 (213-224) (around the disulfide bond between Cys-179 and Cys-214). Contacts between H2 and H3 are very similar (in terms of number and type) with the interhelical contacts seen in other proteins having the same architecture as the PrPC, suggesting that any structural transformation involving H2 (especially its first half, i.e., residues 172-183) is likely to affect H3 also (especially its second half, i.e., residues 213-224).

Clustering of hydrophilic and hydrophobic residues suggests that H2 is frustrated

Istrail et al. (1999) have noted that one of the main features of an aggregation-prone sequence is that the hydrophobic and hydrophilic residues are clustered into a few large groups. To assess the clustering of hydrophobic residues, we looked for how homogeneously the hydrophobic residues are distributed in each of the two halves of the three helices in PrPC in comparison to the other proteins in the dataset. Because there is a rather broad distribution of helix lengths in proteins, to describe the distribution of hydrophobic amino acids in each helix, we used the quantity
&lgr;<SUB><UP>H</UP></SUB>=<FENCE><FR><NU>1</NU><DE>2</DE></FR>−<FR><NU>S<SUB><UP>max</UP></SUB></NU><DE>S<SUB><UP>H</UP></SUB></DE></FR></FENCE>, (3)
where Smax is the maximum number of hydrophobic residues that is found in the two halves of the helix, and SH is the total number of hydrophobic residues in the helix. The histogram of lambda H is presented in Fig. 8. In the helices of normal proteins, the hydrophobic residues are uniformly distributed in the two halves. Two of the helices in prion proteins (H1 and H3) obey this rule very well. The clear difference is in H2, which has nearly all its hydrophobic amino acids clustered in one half. To check the generality of this observation, we calculated lambda H for all the helices (with minimum length of 23 amino acids) in PDB. Using the Database of Structural Motifs in Proteins (DSMP), we found 7854 such helices among the 12,904 proteins in the PDB. The corresponding histogram of lambda H (Fig. 8) is very similar to the one obtained using only the helices in our dataset of proteins.



View larger version (11K):
[in this window]
[in a new window]
 
FIGURE 8   Histogram of the lambda H (Eq. 3) values for the sequences of helices of PrPC, the 57 all alpha  proteins and of the 37 alpha +beta proteins (544 in total). The middle panel shows P(lambda H) for all 7854 helices with at least 23 residues. These were extracted from the PDB using the DSMP database.

Evaluating the structural properties of PrPC using threading

Because of our reliance on environmental classes, a threading study that is based on profiles rather than on specific interactions between proteins would supplement our conclusions. To this end, we use the standard 3D-1D scores introduced by Eisenberg and co-workers (Bowie et al., 1991).

To use the profile method for threading, we first determined for each protein from our dataset the environment in each of its positions. We performed gapless threading of all the prion sequences on all possible fragments of structures from the dataset. For scoring, we used the 3D-1D scores, which encode the likelihood of finding the twenty amino acids in the 18 possible environments (Bowie et al., 1991). For comparison purposes, we also did the threading for all the proteins in our dataset.

According to the scores from threading analysis, the prion proteins are very similar to those alpha +beta proteins of similar length, but which have little helical content compared to the strand content. The R3D-1D scores for all alpha  and alpha +beta proteins having a high helical content (and of similar length as the prion proteins) are well above 30, which is large compared to the scores of 20 obtained for prion proteins. An estimator of the stability of proteins in their native-state conformations is the Z score (Bowie et al., 1991),
Z=<FR><NU>N−⟨N⟩</NU><DE>&sfgr;</DE></FR>, (4)
where N refers to the score of the sequence in its native state conformation, < N> is the average score of the sequence over all other conformations but the native one, and sigma  is the corresponding standard deviation. A stable protein is characterized by a large and positive Z.

The normalized Z scores for the majority of alpha +beta proteins, with helical/strand content similar to that of PrPC, are considerably higher than for prion proteins (data not shown). This suggests that PrPC, which are rich in helices, are not stable in their native state conformations. The normalized Z of 0.7 for PrPC resembles the normalized Z score for other alpha +beta proteins of similar length and with very low helical content.

The goodness of the fit between a sequence and a structure can be assessed using 3D profile scores, which, for correct models, increase with the length of a protein (Luthy et al., 1992). The R3D-1D scores for the stable proteins in the PDB are proportional to the sequence length (L). A plot of the R3D-1D scores for the proteins in our dataset versus L can be fit using
R<SUB><UP>3D−1D</UP></SUB>=0.36 (<UP>±</UP> 0.01)L−6.92 (<UP>± </UP>2.25). (5)
If this is applied to prion proteins, with L = 104 residues, we expect R3D-1D to be between 27.2 and 33.8. But the computed values of R3D-1D are 21.9, 19.9, and 23.6 for mPrP, shPrP, and h2PrP, respectively. This also shows that PrPC proteins are not optimal in their native alpha  helical state.

Helix 2 is frustrated in PrPC mutants: Analysis using PHD

It is known that inherited human TSEs (familial CJD, GSS, and FFI) are associated with mutations in the PRNP gene. According to SWISS-PROT, seven of the point mutations (D178N, V180I, T183A, H187R, T188R, T188K, T188A) are found in H2 (172-194). A naive application of helical propensities due to Chou and Fasman (1978) would suggest that, except for D178N, all the other point mutations should lead to better helix formation. However, reliable secondary-structure prediction requires the use of context-dependent propensities based on multiple sequence alignments as used in PHD (Profile network from Heidelberg) (Rost and Sander, 1993). Kallberg et al. (2001) have also argued that H2 is "frustrated" in the helical conformation, whereas H1 and H3 are not. This conclusion was reached by looking for sequences with >= 7 residues from 1324 protein structures that are predicted by PHD to be in beta -strands, but are experimentally determined to be in alpha -helices. This alpha /beta discordance (or mismatch) was proposed to be associated with amyloid fibril formation (Kallberg et al., 2001). We measure the degree of alpha /beta discordance or frustration using
S<SUB>&agr;/&bgr;</SUB>=<FR><NU>1</NU><DE>L</DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>L</UP></UL></LIM> (R<SUB><UP>i</UP></SUB>−5), (6)
where Ri is the reliability score predicted by PHD for a particular position along the sequence, 5 is the average score, and L is the length of the sequence. If Salpha /beta  = 4 and the sequence is experimentally determined to be in a helical conformation, then the particular sequence is frustrated (or maximally discordant) in the predicted secondary structure; Salpha /beta  = 0 is marginal. Negative values of Salpha /beta imply that PHD is unreliable in this prediction.

It has already been noted by Kallberg et al. that wild-type H2 is discordant in mouse and syrian hamster. We calculated the Salpha /beta scores for the predictions of PHD of secondary structure content due to point mutations in H2 (Fig. 9). The Salpha /beta scores are 1.83, 1.94, 1.80, 1.30, 1.80, 1.54, 1.94, and 1.94 for the WT, D178N, V180I, T183A, H187R, T188K, T188R, and T188A, respectively. Helix 2 for all these mutations is, just as in the WT, discordant or frustrated (Fig. 1). The differences in the Salpha /beta scores suggest that the biophysical characteristics of PrPC mutants can differ greatly (Liemann and Glockshuber, 1999). However, in all cases, H2 is frustrated.



View larger version (80K):
[in this window]
[in a new window]
 
FIGURE 9   Assessment of the extent of frustration (or discordance) in H2 from hPrPC and some disease-causing point mutations. The predicted structures using PHD are designated as strands (E and confidence scores in gold), whereas NMR shows that they are helical (blue). The frustration score, computed using Eq. 6, is given in the text. Examples of other frustrated helices are given in Kallberg et al. (2001).


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
CONCLUSIONS
REFERENCES

Experimental evidence

A key experimentally testable prediction of our analysis is that, in addition to the previously identified residues, even segments of the relatively rigid core of the PrPC, especially the C-terminal residues of H2, might play a role in the transition to an assembly competent state, i.e., one that can nucleate and polymerize. This result and the observation that the secondary structure of PrPSc has a high percentage of beta -sheet content imply that the whole protein molecule unfolds substantially in going from PrPC to an aggregation prone state. Although direct and unequivocal experimental validation is currently not available, many distinct lines of evidence lend support to our theoretical analysis. These include backbone hydrogen/deuterium (H/D) exchange dynamics, NMR relaxation measurements, and CD spectroscopy of the states in the PrPCright-arrowPrPSc transition.

Backbone H/D exchange dynamics

It has been postulated that the formation of fibrils in amyloids and prion proteins occurs from populated (at least partially) intermediates, i.e., PrPCright-arrow[I]right-arrowPrPSc (Hornemann and Glockshuber, 1998; Morillas et al., 2001). To test this proposal, Clarke and coworkers (Hosszu et al., 1999) measured the backbone H/D exchange rates in human prion protein with the disulfide bond between Cys-179 and Cys-214 intact. The structural mobility of the protein fragments can be inferred from such measurements. Under the conditions of the experiment (pH = 5.5 and T = 303 K), PrPC unfolds in a single step upon addition of GuHcl with an equilibrium constant KUN sime  1.5 × 103 (Hosszu et al., 1999). The exchange process can be used to obtain the residue-dependent protection factor P, which gives an indication of the structural mobility. Efficient H/D exchange occurs from the solvent-exposed region. Thus, if P sime  KUN, then exchange occurs only from the unfolded PrPC rather than from an equilibrium or off-pathway intermediate. Remarkably, it was found that only about 10 residues (Hosszu et al., 1999), clustered around the disulfide bond in the core of PrPC, have P values greater than KUN. Of these, only Cys-179 and Ile-182 from H2 have p > KUN. The rest of the residues are in H3 (see Fig. 2 a of Hosszu et al., 1999). Using these observations, Clarke and coworkers ruled out the possibility that PrPSc forms from a native-like intermediate. Their experiments and the observation that PrPSc has predominantly beta -sheet architecture led them to conclude that "complete or near-complete unfolding must precede rearrangement of the amyloidogenic intermediate."

Biophysical studies

NMR structures show that the N-terminal residues (90-120) in PrPC are disordered (Riek et al., 1996, 1997; James et al., 1997; Donne et al., 1997; Zahn et al., 2000). It is also known that this region is resistant to protease digestion, which gives credence to the notion that it is structured in PrPSc. Indirect evidence for this finding comes from the work of Peretz et al. (1997), who identified PrPC- and PrPSc-specific epitopes in the region (90-120) using binding of recombinant Fab fragments. This study suggests that the smallest region that undergoes structural transition upon prion formation is the disordered segment (90-120) in PrPC. To test whether (90-120) is the only region with altered conformation in PrPSc, Hornemann and Glockshuber (1998) performed biophysical studies of the structured C-terminal domain (121-231) of PrPC. Under acidic conditions (pH < 5) urea-induced unfolding of PrPC exhibits a three-state transition with a well defined "equilibrium" intermediate. Far UV-CD spectrum of the acid intermediate revealed that it has the characteristics of a beta -sheet structure (Hornemann and Glockshuber, 1998). These measurements suggest that, regardless of whether the intermediate is under equilibrium, certain regions besides the previously implicated (90-120) fragment (Peretz et al., 1997) must also be implicated in the conformational transition. This is not inconsistent with our analysis.

Unfolding of PrPC

Because the core of PrPC might play a role in the conformational change, it follows that substantial unfolding must precede the formation of a species, beta -PrP (Baskakov et al., 2001), that can subsequently nucleate and polymerize. This result, when combined with the stability of PrPC [(6-10) kcal/mole depending on conditions and fragment length] suggests that a large barrier separates the alpha -helical PrPC and beta -PrP. The time scale for beta -PrP, the