| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Biophys J, May 1999, p. 2329-2345, Vol. 76, No. 5
Centre National de la Recherche Scientifique URA D1284 Neurobiologie Moléculaire, Institut Pasteur, 75015 Paris, France
| |
ABSTRACT |
|---|
|
|
|---|
Abstract A refined prediction of the nicotinic acetylcholine
receptor (nAChR) subunits' secondary structure was computed with third-generation algorithms. The four selected programs, PHD, Predator,
DSC, and NNSSP, based on different prediction approaches, were applied
to each sequence of an alignment of nAChR and 5-HT3 receptor subunits, as well as a larger alignment with related subunit
sequences from glycine and GABA receptors. A consensus prediction was
computed for the nAChR subunits through a "winner takes all"
method. By integrating the probabilities obtained with PHD, DSC, and
NNSSP, this prediction was filtered in order to eliminate the
singletons and to more precisely establish the structure limits (only
4% of the residues were modified). The final consensus secondary
structure includes nine
-helices (24.2% of the residues, with an
average length of 13.9 residues) and 17
-strands (22.5% of the
residues, with an average length of 6.6 residues). The large
extracellular domain is predicted to be mainly composed of
-strands,
with only two helices at the amino-terminal end. The transmembrane
segments are predicted to be in a mixed
/
topology (with a
predominance of
-helices), with no known equivalent in the current
protein database. The cytoplasmic domain is predicted to consist of two
well-conserved amphipathic helices joined together by an unfolded
stretch of variable length and sequence. In general, the segments
predicted to occur in a periodic structure correspond to the more
conserved regions, as defined by an analysis of sequence conservation
per position performed on 152 superfamily members. The solvent
accessibility of each residue was predicted from the multiple
alignments with PHDacc. Each segment with more than three exposed
residues was assumed to be external to the core protein. Overall, these
data constitute an envelope of structural constraints. In a subsequent
step, experimental data relative to the extracellular portion of the
complete receptor were incorporated into the model. This led to a
proposed two-dimensional representation of the secondary structure in
which the peptide chain of the extracellular domain winds alternatively
between the two interfaces of the subunit. Although this representation
is not a tertiary structure and does not lead to predictions of
specific
-
interaction, it should provide a basic framework for
further mutagenesis investigations and for fold recognition (threading) searches.
| |
INTRODUCTION |
|---|
|
|
|---|
The nicotinic acetylcholine receptors (nAChRs)
belong to the superfamily of ligand-gated ion channels (LGIC)
(Cockcroft et al., 1992
; Galzi and Changeux, 1994
) that are allosteric
transmembrane proteins responsible for fast ionic responses to
neurotransmitters. These receptors are homo- or hetero-pentamers made
from a set of 16 related subunits in vertebrates (9
, 4
,
,
,
and
) (review in Le Novère and Changeux, 1995
). Other
receptors formed of polypeptides homologous to the nAChR subunits
include 5-HT3, GABAA, GABAC, and glycine
receptors of vertebrates, as well as their invertebrate counterparts.
Despite their rather different pharmacological properties (Ortells and
Lunt, 1995
), these receptors likely possess a common quaternary
structure (Eisele et al., 1993
; Langosch et al., 1988
). The vertebrate
GABAA receptors are formed from a set of 16 related subunits (6
, 4
, 4
, 1
, and 1
) (MacDonald and Olsen,
1994
). The GABAC receptors are homo-pentamers of
1-3
subunits (Bormann and Feigenspan, 1995
). The vertebrate glycine
receptors are made from a set of five related subunits (4
and 1
)
(Bechade and Triller, 1994
).
Every mature subunit of the nAChR family is assumed to follow the same
transmembrane topology (Hucho et al., 1996
). A large amino-terminal
portion carrying the components of the acetylcholine (ACh) binding site
faces the extracellular environment. The three subsequent segments
cross the membrane, followed by a large intracellular domain and a
fourth segment that again crosses the membrane. The relatively
short carboxy-terminal domain is extracellular.
The sequence conservation varies along the subunits. The amino-terminal
signal peptide and the middle of the cytoplasmic portion are highly
variable, whereas the amino-terminal moiety, as well as the membrane
flanking portions of the cytoplasmic part, are well conserved. The
transmembrane segments are highly conserved. In humans, the size of the
subunits vary from 457 aa (
1) to 627 aa (
4).
The overall low resolution structure of the nAChRs was initially
determined by electron microscopy on single molecules (Cartaud et al.,
1973
), or bi-dimensional crystals (Kistler and Stroud, 1981
) of
Torpedo electric organ receptor. The nAChR molecules result
from the assembly of five subunits arranged around an axis of symmetry
perpendicular to the membrane. The length of the molecule is ~120 Å,
with an extracellular, funnel-shaped portion of 60 Å and a
transmembrane portion of 30 Å. The diameter of the extracellular entry
of the pore is ~25 Å wide, while the intracellular one is slightly
smaller (Toyoshima and Unwin, 1988
; Unwin, 1993a
). A similar shape was
proposed for GABAA receptors (Nayeem et al., 1994
).
Affinity labeling and site-directed mutagenesis have shown that the
ligand binding sites are located at the interface of two subunits,
formed by residues belonging to two components (Galzi and Changeux,
1994
; Table 1). The principal component
[on the subunit surface that would be reached first if following the
clockwise path when the structure is viewed from the extracellular
surface (Machold et al., 1995
)] is carried by the
subunits and
comprises at least three segments or loops (Dennis et al., 1988
; Galzi
et al., 1990
). Facing it, the complementary component includes three (or possibly four) different segments or loops (Corringer et al., 1995
;
Czajkowski et al., 1993
; Prince and Sine, 1996
; Sine et al., 1995
;
reviewed in Hucho et al., 1996
; Tsigelny et al., 1997
). Such composite
ligand binding sites appear to be conserved throughout the superfamily
of LGIC. Indeed, it has been shown that the binding sites for ACh,
GABA, glycine, and benzodiazepines are homologous (Schmieden et al.,
1992
; Vandenberg et al., 1992
; reviewed in Galzi and Changeux, 1994
).
The ionotropic glutamate receptors constitute a separate superfamily in
which agonist sites probably do not occur at subunit interfaces (Paas,
1998
).
|
Cryoelectron microscopy of the Torpedo electric organ
receptor has provided three-dimensional (3D) images of the nAChR at a
resolution of 9 Å (Unwin, 1993a
). Such a resolution is not sufficient to resolve the spatial position and the secondary structure assignment of any particular amino acid. Although the extracellular domain has
been successfully produced in a soluble form (Wells et al., 1998
), the
quantities obtained are still too low to permit the production of
crystals suitable for x-ray diffraction. The NMR approach has been
limited to small fragments (Basus et al., 1993
). Some attempts were
made with other methods, such as atomic force microscopy (Lal and Yu,
1993
), though with a resolution lower than that of electron microscopy
on two-dimensional (2D) crystals.
It is therefore of interest to obtain information on the receptor
protein organization from the data currently available, i.e., the
sequences of the subunits. Accordingly, in parallel with the
experimental approaches, efforts have been made to predict the
structure of the individual subunit with computational techniques. Two
approaches have been used. The comparative modeling techniques sought
to give a structural description of a protein provided that a plausible
structural model can be identified. The problem resides in the
identification of a suitable template from sequence information only.
However, the lack of sufficient sequence similarity between an nAChR
subunit and a protein of known structure requires fold recognition
methods (Gready et al., 1997
; Tsigelny et al., 1997
) which, as known
from test cases, are only partially successful in recognizing similar
folds in the absence of sequence similitude (Rost and Sander, 1996
).
This approach also suffers from the fact that a plausible 3D model may
not exist in the currently available protein structure database
(Marchler-Bauer et al., 1997
). The two models proposed so far are
indeed different (Gready et al., 1997
; Tsigelny et al., 1997
). In
parallel, ab initio secondary structure predictions were performed with
first-generation algorithms (single amino acid-based, 50-60%
accuracy) by Finer-Moore and Stroud (1984)
and Ortells (1997)
.
Here we present a secondary structure prediction of the nAChR subunit
based on third-generation algorithms (based on multiple alignments and
that are capable of achieving >70% accuracy), in order to take into
account the information derived from the wealth of cloned homologous
subunit sequences. The combination of several independent first- and
second-generation algorithms has been shown to increase the accuracy of
secondary structure predictions (Biou et al., 1988
; Nishikawa and Ooi,
1986
; Zhang et al., 1992
). We describe a program that integrates
results from several prediction algorithms and multiple homologous
proteins. We applied this program to the different members of the nAChR
family and LGIC superfamily to increase the signal/noise ratio. In
addition, the program furnished the consensus of predicted solvent
accessibility and topology. By using these data in combination with
information obtained from experimental sources, we integrated the
results into a 2D representation of a typical nAChR subunit.
| |
MATERIALS AND METHODS |
|---|
|
|
|---|
Alignment
All sequences used in this study can be found in the
ligand-gated ion channel (LGIC) subunit database at the URL
(http://www.pasteur.fr/units/neubiomol/LGIC.html). For the secondary
structure predictions, two multialignments were achieved with ClustalX
software (Thompson et al., 1997
; available at ftp-igbmc.u-strasbg.fr)
(pairwise gap opening, 10; pairwise gap extension, 0.1; multiple gap
opening, 5; multiple gap extension, 0.05; Blosum matrix series). One
alignment was carried out with 18 subunit sequences of cationic
channels (AL1). AL1 contains 5-HT3 from Mus, nicotinic
1
of Torpedo,
2-6,
9, and
2-4 of Rattus,
7-8 of Gallus,
1,
,
,
, of Mus
(one example of each paralog gene), and DEG3 of
Cænorhabditis (which has still no uncovered vertebrates
ortholog). Another alignment was constructed with 38 LGIC (cationic and
anionic LGIC) sequences (AL2). AL2 contains AL1 subunit sequences plus
GABA
1-6,
1-3,
1-3,
1-3,
, glycine
1-3, and
from Rattus. The aim was to determine whether the incorporation of information from more distantly related sequences would improve the predictions. We did not use more than one sequence per group orthology because of the high similarity between orthologs (and hence the lack of additional information brought from the use of
multiple orthologs). The ASSP software (Russel and Barton, 1993
;
available at ftp://geoff.biop.ox.ac.uk/programs/assp/) allowed us to
expect a Q3 accuracy (i.e., a percentage of three-state comparison identity) of perfect prediction in the interval
[83.45-100%] for AL1 and [82.74-100%] for AL2. To study the
conservation of sequence at each position along the sequence, a third
multiple alignment was constructed from 152 different LGIC subunit
sequences. All these sequences correspond to subunits shown to be
integrated in functional receptors (thus eliminating the putative
members originating from large-scale genome projects).
Secondary structure prediction by consensus average
A computer program was written in C to integrate secondary structure predictions based on different algorithms. SSPCA (for secondary structure prediction by consensus average) was designed to combine three-state predictions and probabilities from several prediction programs and several sequences (Fig. 1). The SSPCA program is also designed to treat other types of prediction such as solvent accessibility and topological arrangements for membrane proteins. The individual predictions were not weighted by sequence similitudes.
|
As input, SSPCA takes an alignment of amino acid sequences (in a
Clustal format) and a file containing the predictions. The prediction
file contains for each sequence and for each method (if available) the
probability for helix,
, and coil [0-9], the resulting secondary
structure prediction [H(elix) or E(xtended) or C(oil)], the
probability of accessibility to solvent [0-9], the resulting
accessibility to solvent (e(xposed) or b(uried)), and the topological
state (o(utside), i(nside), T(ransmembrane)). The output of SSPCA is
composed of (points 1-5 concern only the secondary structure
prediction):
1. The MxS predictions P(mi,sx) where M is the number of method, S the number of sequences, mi the ith method, and sx the xth sequence, projected on the alignment (insertion of gap in the predictions when present in the alignment). Each P(mi,sx) is then a character string with the length of alignment, each character belonging to {H, E, C,`-'}).
2. The MxS(MxS
1)/2 pairwise comparisons
C[P(mi,sx),
P(mj,sy)]
of the predictions
P(mi,sx)
and
P(mj,sy).
If
is the set of positions of the alignment where
neither
P(mi,sx)
nor
P(mj,sy) contain a gap, that is, where both predictions are defined,
|
(1) |
|
|
) is the cardinal (the number of
elements) of
.
3. The congruence between methods µi,j: this
parameter represents the percent identity between the consensus of
predictions for all the sequences of two methods.
|
(2) |
j.
4. The congruence between sequences
x,y. This parameter
represents the percent identity between the consensus of predictions for two sequences by all the methods.
|
(3) |
y. This parameter permits the comparison of the predictions
for different homologous proteins.
5. The consensus predictions and the sum of probabilities: by sequences, by methods, and in toto (and the percent helix and strand for each consensus prediction). For each position, the consensus is computed as the major state. In case of identical cardinals, the arbitrary priority order is E > H > C > `-'. The percent helix and strand is given for the total nongapped consensus length.
6. The global consensus solvent accessibility. In case of identical cardinals, the arbitrary priority order is b > e > `-'.
7. The global consensus topology. In case of identical cardinals, the arbitrary priority order is T > i > o > `-'.
Secondary structure prediction programs
PHDsec (Rost and Sander, 1993a
, b
; 1994a
) is composed of several
cascading neural networks (previously trained on proteins of known
structures). A first network takes as input a set of vectors
representing the amino acid composition at positions of the multiple
alignment in a window sliding along it. Its output is composed of a
vector representing the probabilities for each of the three states of
the central residue of the window. Since the secondary structure of a
residue is not independent of the structure of neighboring residues, a
second step takes into account these local interactions. A neural
network takes as input the vectors present in a window sliding along
the previous output. Its own output is a refined three-state
probabilities vector. Another step consists of averaging (for each
state) the outputs from independently trained networks. Finally, a
"winner takes all" decision assigns the secondary structure state.
No explicit rules are included in the algorithm. PHD may generate its
own alignment with the submitted sequence [with the MaxHom algorithm (Sander and Schneider, 1991
)]. Therefore, for every sequence of AL1
and AL2, a different alignment was generated and used for the
prediction. PHDsec is accessible at the URL
(http://www.embl-heidelberg.de/predictprotein/predictprotein.html).
PREDATOR (Frishman and Argos, 1996
, 1997
) is based on the
calculated propensities of every 400 amino acid pairs to interact inside an
-helix or one upon three types of
-bridges. It then incorporates nonlocal interaction statistics. PREDATOR also uses propensities for
-helix,
-strand, and coil derived from a
nearest-neighbor approach (see below). To use information obtained from
homologous proteins, PREDATOR relies on local pairwise alignments.
PREDATOR is able to use Clustal alignment as input. The program was
employed with the option `-a,' which furnishes a prediction for every
sequence of the input set. The source code is kindly distributed by the authors. PREDATOR is also accessible at the URL
(http://www.embl-heidelberg.de/cgi/predator_serv.pl).
DSC (King and Sternberg, 1996
) combines several explicit parameters in
order to produce a "meaningful" prediction. It runs the GORIII
algorithm [Gibrat et al. (1987)
, based on information theory applied
to local interactions] on every sequence to provide mean potentials
for the three states. In addition, DSC uses the presence of
insertions/deletions, the distance from the end of the chain, the
moment of conservation, and the moment of hydrophobicity (the two last
parameters given an
-helical structure and a
-strand structure).
A linear combination of these different attributes gives an output that
is subsequently filtered. The program was used with the following
options: `-a' (to turn off removal of poorly aligned sections),
`-i' (to stop removal of isolated predictions), `-f1' (to apply the
filtering rules once), and `-w' (Clustalw alignment). The source code
is kindly distributed by the authors. DSC is also accessible at the URL
(http://bonsai.lif.icnet.uk/bmm/dsc/dsc_read_align.html).
NNSSP (Salamov and Solovyev, 1995
) is based on the nearest-neighbor
algorithm [sometimes improperly called the "homologue" method
(Levin et al., 1986
; Nishikawa and Ooi, 1986
)]. The basic idea of the
nearest-neighbor approach is the prediction of the secondary structure
state of the central residue of a test segment, based on the secondary
structure of similar segments from proteins with known 3D structure.
The information provided by the different templates is scored according
to their similarity (according to the sequence or other properties)
with the test segment. NNSSP is an enhancement of the algorithm
designed by Yi and Lander (1993)
, which selects the neighbors by the
mean an environmental score (Bowie et al., 1991
) and combine by
the mean of a neural network predictions made with different parameters
(environmental scores, length of nearest-neighbors ... ). In
addition to the latter program, it incorporates information from
multiple aligned sequences (by averaging their scores for the weighting
of each nearest-neighbor). An executable program was kindly provided by
the authors. NNSSP is also accessible at the URL
(http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html).
A program was written in C to convert a Clustal alignment into NNSSP alignments, clu2nnssp. This program is available at the URL (http://www.pasteur.fr/units/neubiomol/softwares.html) or upon request.
Accessibility to solvent and topology program
PHDacc (Rost and Sander, 1994b
) is able to compute the probable
accessibility to solvent. It was used to refine the secondary structure predictions.
PHDhtm (Rost et al., 1995
, 1996
) was used to provide a more accurate
prediction of the transmembrane segments position, rather than the
original one, established with only a few subunits and only from
hydropathy plots (Popot and Changeux, 1984
).
Conservation index
A computer program, ConsIndex, was written in C to compute the
sequence conservation between homologous sequences at each position of
a multiple alignment. The program takes as input an alignment of a
Clustal-like format and a similarity matrix. It computes first the
N(N
1)/2 global similarities
Sij (identities if the identity matrix is input)
of the N sequences. Then for each position of the alignment,
a conservation index CI is computed as follows:
|
(4) |
1.2,1.5] to [0-100]. The gap was added as
an independent amino acid, with every matrix element involving it
considered as null. ConsIndex is available at the URL
(http://www.pasteur.fr/units/neubiomol/softwares.html) or upon request.
| |
RESULTS |
|---|
|
|
|---|
Strategy
Previous works have shown that the accuracy of secondary structure
predictions increases from the combination of several independent first- and second-generation algorithms (Biou et al., 1988
; Nishikawa and Ooi, 1986
; Zhang et al., 1992
). Here we combined the prediction of
several third-generation algorithms, using the information given by a
set of aligned homologous sequences to compute the secondary structure
of nAChR subunits.
The algorithms used in this study were chosen according to three
criteria: 1) they analyze multiple alignments instead of single protein
sequences; 2) they yield a better than 70% accuracy for three-state
(H, E, C) prediction when tested on a set of proteins of known
structure with sequence identities lower than 25% (Rost and Sander,
1994a
) or during blind predictive situations (King, 1996
; Rost, 1997
);
and 3) each of these algorithms is based on a different predictive
approach. Each program was applied successively on every sequence of
the alignments to increase the signal/noise ratio.
Two different sets of sequences were used to make the secondary structure predictions. The first (AL1) represented the entire group of cationic LGIC subunits in the acetylcholine receptor superfamily (5-HT3 and nicotinic receptors). A second set of sequences (AL2) contained the first set and, in addition, sequences covering the whole group of anionic LGIC subunits (GABA and glycine receptors).
Consistency of the predictions between methods and sequences
The congruencies between methods µi,j for every pair
of methods are listed in Table 2. The
four methods gave all µi,j values >67%. The use of the
larger set of sequences caused a decrease of µ, which nevertheless
remained >57%. The congruence between sequence consensus predictions
x,y was also examined for every pair of sequences. The
predictions for the cationic LGIC subunits were found consistent,
the congruencies varying from
deg3,
1 = 80.8%
to 
3,
6 = 95.3%. In the larger set, the
lowest
occurred just above 64%, a value much larger than random
(which is 33% for a nonbiased three-state comparison and 38% if the
present bias of PDB is taken into consideration). The good congruency of the different predictions for the various members of the nAChR family is illustrated in Fig. 2
(top), where the peaks are sharp and 17 of 25 final
structural elements are predicted in >90% of the cases. The sequence
consensus predictions were very similar. The positions of the secondary
structure were almost identical, with little variation of the
assignments. The method consensus predictions were more variable,
though similar. The assignment of the structures varied somewhat, as
well as (but only very rarely) their occurrence.
|
|
The resemblance of protein 3D structures (rmsd) is proportional to
their sequence identity (Chothia and Lesk, 1986
; Flores et al., 1991
).
The incorporation of distant sequence information is expected to
increase the reliability of predicted structures, although decreasing
the consistency of the overall prediction (Russel and Barton, 1993
;
Sternberg, 1996
). The
values of AL2 were plotted against the global
amino acid similarities determined by the conservation index program. A
correlation unambiguously occurred between the sequence similarities
and the structure prediction similarities (Fig.
3) (n = 703, r = 0.882, p < 0.001). Two main components emerged from the comparisons: a lower similarity population representing the comparisons of anionic/cationic (e.g.,
GABAA vs. nAChR), with a higher similarity population
representing the comparisons of anionic/anionic or cationic/cationic
LGIC subunits. Together, these data show that the variations between
the secondary structure predictions therefore were not random, as
expected from algorithm imperfections. On the contrary, they relied on
the variation of sequence. This reflects the fact that if the core
structures are conserved between the different superfamily members, as
supported by a large body of experimental evidence (Galzi and Changeux, 1994
), then the structural assignation at the level of individual residue may vary (for instance at the extremities of the structures). This variation was indeed found to increase when the sequence relationship decreases. Another conclusion can be derived from Fig. 3.
The more distant to our target protein are the homologs used to infer
secondary structure, the less reliable is the information obtained. A
trade-off is reached between the information obtained from multiple
alignments (reliability of secondary structure position and assignment)
and the mispredictions at the level of individual residues due to
sequence divergence (Russel and Barton, 1993
). There is no known method
published up to now to establish the best compromise.
|
The final results obtained with the two alignments AL1 and AL2 were very similar, with only a few residues predicted to be in a different state. Every structure except one was equally predicted with both sets, and in all these cases the secondary structure assignment remained the same. Therefore, except when otherwise stated, we present the results obtained with AL1 (see Fig. 4).
|
Raw secondary structure prediction
The proportions of the three-state populations in the entire set of predictions for each alignment position are presented in Fig. 2 (top). Fig. 4 shows the raw consensus prediction, in plain text just below the alignments, with the designation of the structure above. In Fig. 2 (bottom), the conservation index determined on the full superfamily of LGIC (152 subunits) is plotted, along with the predicted secondary structures (black squares below the graph). In all instances but three (E9, HF, and HG) the predicted structures were located in regions of high (>50%) conservation. The region of E9 is in fact highly conserved except for the nematode subunit unc38. The region of HF and HG are highly conserved in cationic channel subunits but less in anionic channels. In summary, within the cationic channel subunit family, all predicted structures were located in regions of high conservation. This fact is important since the high sequence variation between family members is likely to occur in less well-structured regions. A structure predicted in a conserved region is therefore more likely to be accurate.
Refined secondary structure predictions
PHD, DSC, and NNSSP provide prediction probabilities for the three
states in addition to the predicted state. Combining these probabilities permits the correction of the threshold decisions at the
level of single predictions, which may lead to false assignment, and
offers the possibility of resolving some single-residue problems, such
as singletons (isolated structured residue) or amino acid located at
the borderline of the secondary structure motifs. The changes made in
this way affect only 29 residues. The resulting refined prediction
contains (without the signal peptide) nine
-helices (mean length
13.9 amino acids) designated HA to HH, and 17
-strands, designated E1 to E17 (mean length
6.6 amino acids). Their positions and lengths are summarized in Table
3. Except for two large helices
surrounding a large
-strand at the amino-terminal extremity, the
extracellular portion of the subunits was predicted to occur as an
all-
structure, formed by successive short strands.
|
The structure of the carboxy-terminal portion of HA is consistent with solvent accessibility patterns (described in the following by strings of `e' for exposed and `b' for buried) i.e., "bbeebbee," its amino-terminal portion being completely exposed.
The structure at the center of E1 is also consistent with
solvent accessibility patterns "bebebe"; its two extremities being predicted as completely buried. Its carboxy-terminal portion is less
consistent, since it is predicted in an
-helical state in every AL1
sequence consensus (see Fig. 2, top). An
-helical
structure for the last four residues could be envisioned. Indeed, for
AL1 these
-helical residues were predicted in every sequence
consensus and in three of four methods' consensus (only PHD predicted
all the residues under
-strand state). However, for AL2, only the PREDATOR consensus, nicotinic
8 consensus, and nicotinic
consensus presented some residues predicted as
-helical. This unique
-helical turn could be a specific feature of the cationic channel
subunits, since it does not appear in the AL2 sequence consensus, where the extended structure is consistently predicted.
The main immunogenic region (MIR) is located from the end of
HB to the beginning of E3 (Tzartos et al.,
1990
). This segment was already known to be exposed to the solvent
since it is directly involved in numerous forms of the autoimmune
disease myasthenia gravis (Tzartos et al., 1990
). Accordingly, its
central portion is predicted as totally accessible to the solvent.
The assignment of HB appeared consistent with all
predictions except those of DSC for AL1 and AL2 as well as PHD on AL2
(only some residue predicted under
-strand state). The solvent
accessibility pattern is more consistent with a
-strand in the
carboxy-terminal portion. However, the glycosylation at
AL1102 (as well as at AL1198 and
AL1242) implies that these residues are exposed to the
exterior of the receptor. Since the residue AL1104 is
labeled by tubocurarine and presumably faces the binding site, a
-strand might cause steric hindrance between the ligand and the
sugar, whereas an
-helix would place the side chain of the two
residues in opposite directions.
The structure of E2 (three residues long) is not predicted
by the analysis of AL2. It is the only structural element that differs
between the two analyses. However, the assignment of E3 is
contradicted by a cross-linking experiment (Watty et al., 1998
) showing
that its two first residues should expose their side chain in the same direction.
E4 is predicted to be completely buried. E5 and E8 are consistent with solvent accessibility "ebebebeb."
The predictions of the E12-15 and HC-E are probably less accurate than the extramembranous portions. Indeed, the secondary-structure prediction programs were not designed or tested with membrane proteins (see Discussion). The length of the predicted secondary structures varied considerably according to the set of sequences used. With AL2, HD is shorter (in MII), HE is longer, and E15 shorter (in MIII). Finally, HF and HG are fully consistent with the solvent accessibility predictions "bbebbbebbbebbbebb" and "eebeebbebbebbbeeb," implying one face exposed, the other buried.
2D representation of the amino-terminal domain
Data may be added to the 1D structural assignments given by SSPCA.
This defines an envelope of structural constraints (Fig. 5), which permits proposal of a 2D
folding of the peptide chain. No data concerning the tertiary folding
are included, since no
-
interactions are known.
|
First, on the basis of electron microscopy images, we may locate the
MIR at the distal end of the receptor, respectively to the membrane
(Beroukhim and Unwin, 1995
). As a consequence, E2 and
E3 are also placed at the top of the fold. E11
is likely to be close to the membrane since it is adjacent to MI (see
below for the position of the transmembrane domains). Then, we may
assume that each stretch of at least four residues predicted to be
exposed to the solvent forms a loop at the surface of the subunit. This constraint implies bending of the 1D structure between E5
and E6, E8 and E9, E9
and E10, and E10 and E11. The
beginning of E7 and E8 are linked by a
disulfide bond, and are thus in close proximity. This disulfide bridge
forces a new bend between E7 and E8. This so-called "Cys-loop" is the most conserved part of the LGIC subunit amino-terminal domains. Although half of it is not predicted to be
folded into a periodic structure, we may reasonably hypothesize that
the entire region adopts a strongly constrained conformation. Finally,
a bend is introduced between E3 and E5 to
respect the observed size of the subunit, which protrudes 60 Å from
the membrane, with a diameter of ~40 Å.
Each subunit can then be artificially subdivided into two domains. One
is formed by HA, E1, and HB, the
other by E2-11. On the basis of the cryoelectron
microscopy images of Unwin (1993b)
, HA and HB
have been disposed perpendicularly to the membrane.
This representation is fully compatible with the body of experimental
data concerning the nicotinic binding site. Indeed, affinity labeling
and site-directed mutagenesis led to the identification of amino acids
(see Table 1) that are distributed at the interface of the subunits on
six different elements, referred to as A (
7W85 and
7Y92), B
(
7W148 and
7Y150), and C (
7Y187,
7C189,
7C190, and
7Y194) for the "principal" component; and D (
7W54), E
(
7L108,
7N110,
7Q116, and
7L118), and F (
7D163 and
7E172) for the "complementary component" (note that all residues
are quoted according to the mature chick
7 subunit; please see the
alignment for conversion. It does not mean that these residues have
been identified only in or also in this subunit). Another residue has
recently been identified on the complementary component (
7T34).
Since it is located in E1, its position does not add
further constraints on the 2D representation, though it possibly
constrains the tertiary folding.
Affinity labeling experiments with toxin derivatives assigned the
principal and complementary binding components to be carried by the
clockwise and counterclockwise sides of the subunits, respectively, when the receptor is seen from the extracellular compartment (Machold et al., 1995
).
The HAE1HB part must be folded onto
the E2-11 sheet in order to form a compact structure,
contained in a 40 × 60 Å2 surface, and to
account for the possible contribution of residues homologous to mouse
K34 to the active site. Yet, only a few data constrain the
folding of the HAE1HB domain.
The transmembrane and cytoplasmic domains
We used PHDhtm (Rost et al., 1995
, 1996
) to investigate
the organization of membrane spanning segments. PHDhtm is the only program that did not predict the signal peptide as transmembrane domain, probably because of its lack of conservation. In addition, it
predicted the four transmembrane domains for each LGIC members. SSPCA
provided the consensus of output from PHDhtm applied to all the
sequences of AL1. The results, compiled in Table
5,
yielded four transmembrane segments. The length of the consensus
segments are 18 for MI, 17 for MII, 19 for MIII, and 17 for MIV. For
comparison, four other programs were also used on
1 and
7. All of
them predicted the four transmembrane segments of AL1 sequences,
although in some cases other parts of the subunit were incorrectly
predicted as crossing the membrane.
|
|
The length of the consensus transmembrane segments are smaller than
depicted in the usual proposals (Popot and Changeux, 1984
). However,
this could be an artifact due to conservative prediction of PHDhtm. For
comparison, four other programs were also used on
1 and
7. The
results vary according to the method but also according to the
sequences used. This fact supports the importance of using consensus of
multiple analyses.
SSPCA predicts each of the transmembrane segments to fold in a mixed
helix/
strand fashion, with almost no coiled structures. Fig. 5
shows an attempt to represent the transmembrane portion in 2D. Yet,
since the present study gives no information about the precise
orientation of the structures in the membrane, the represented angles
are arbitrary, except in the case of the helix present in the MII
segment shown to be orientated rather perpendicular to the membrane. In
addition, the length of the predicted structures in the membrane is
poorly accurate.
Except for HF and HG, the cytoplasmic domain is predicted as totally accessible to solvent and in a nonperiodic structure. The solvent accessibility pattern of the two helices suggests that they possess one face buried and another exposed to the solvent (see Discussion).
| |
DISCUSSION |
|---|
|
|
|---|
Previous works have shown that the accuracy of secondary structure
predictions increases with the combination of several independent algorithms (Biou et al., 1988
; Nishikawa and Ooi, 1986
; Zhang et al.,
1992
). As reported here, in order to determine the best available
prediction of the nAChR subunits' secondary structure, we integrated
the results of several third-generation programs, using the information
from a set of aligned homologous sequences. These programs were
selected on the basis of their recognized efficiency on test sets of
proteins with known secondary structure (Rost and Sander, 1994a
) or
during blind predictive situations (King, 1996
; Rost, 1997
). Moreover,
each program was applied on every sequence of the alignments in order
to increase the signal/noise ratio.
Two main ab initio predictions have been reported for nAChRs in the
past two decades. Finer-Moore and Stroud (1984)
used the algorithm of
Garnier et al. (1978)
for the extramembranous regions and an analysis
(by Fourier transformations) of hydrophobicity periodicity for the
putative transmembrane regions. Recently, Ortells (1997)
presented a
secondary structure prediction based on a Chou and Fasman-like
algorithm (Chou and Fasman, 1978
). The main difference between the
initial method and the one used by Ortells resides in the definition of
the secondary structure initiators. Instead of being predicted solely
by the sequence (via statistical tables) as in the Chou and Fasman
algorithm, these initiators were determined as follows: an initiator
was defined as a residue that is constantly predicted in the same
state, across different sets of LGIC subunit sequences, analyzed by
first- and second-generation algorithms. Another difference resides in
the fact that the propagation from the initiators was unidirectional
(in Ortells, 1997
), from the amino-terminal to the carboxy-terminal,
while it is bi-directional in Chou and Fasman (1978)
. The expected
prediction accuracy has already been discussed elsewhere (see Kabsch
and Sander, 1983
; and Nishikawa, 1983
for initial assessment, and Rost
and Sander, 1994a
; 1996
for recent reviews), but the difference of
expected accuracy between these pioneering works and our own may reach 20%. On an identical test set, Chou and Fasman reached 49% in Q3, whereas PHD2 reached 72.5% (Rost and Sander, 1994a
).
-Helix and
-strand contents
The helix and strand content of the entire nAChR was measured by
several groups using different spectroscopic measurement methods
(Butler and McNamee, 1993
; Méthot et al., 1994
; Moore et al.,
1974
; Yager et al., 1984
). The results showed a high variability, which
cannot be due solely to the differences of receptor environment. Indeed, inferred helix content varied from 18.7% (Butler and McNamee, 1993
) to 48% (West et al., 1997
), inferred strand content (without
-turn) varied from 26% (West et al., 1997
) to 42% (Butler and McNamee, 1993
), and the calculated helix/strand ratio varied from 0.45 (Butler and McNamee, 1993
) to 1.85 (West et al., 1997
), with Méthot et al. (1994)
and Yager et al. (1984)
finding intermediate values of 1.11 and 1.18) (see Table 4). The corrected SSPCA consensus yielded slightly lower values of helix and strand contents, although the ratio is consistent with the mean of experimentally determined ratios. In the amino-terminal portion (as defined by West et al., 1997
and not by our transmembrane segment determination), our consensus
gives an equivalent predicted helix content (13.7% vs. 12%) and less
strand (31.7% vs. 51%) compared to the experimentally observed one in
the unique study of West et al. (1997)
.
Comparison with other predictions of the amino-terminal domain
At the level of the extracellular amino-terminal domain, all
approaches predicted a structure mainly folded in
-strand (Fig. 6 A). However, the position
of the structures, as well as their number, differ considerably among
the different studies. The high
content is also consistent with the
cryoelectron microscopy images (although three helices were proposed in
these latter investigations; Unwin, 1993b
, 1996
). The structures
predicted by Ortells (1997)
are longer than those presented in the
present work and longer than the value observed in the PDB. Notably,
the two large
-helices predicted in the amino-terminal half of the
extracellular portion are 20 aa long, whereas we predict 12 and 14 aa
and the PDB average is 9. Also, the mean length of
-strand is 7.2 in
Ortells (1997)
, 5.8 in the present work, and 5.1 in the PDB. These
discrepancies are likely due to the method Ortells used to propagate
the structural elements. When initiated, each element is extended
forward until a different initiator or a proline or a glycine is
reached.
|
Fig. 5 A also provides a comparison with the
secondary structures derived from threading methods (Gready et al.,
1997
; Tsigelny et al., 1997
). In this case, not only the lengths of the
motifs, but also their positions in the sequence are very different.
Comparison with other predictions of the membrane-spanning segments
The location of the four putative transmembrane segments was
originally performed by hydropathy plot analysis. This method, though
of great interest and easy to use, does not apply satisfactorily in the
case of membrane channels. Indeed, the residues lining the pore in the
open state are not anticipated to be hydrophobic. Moreover, in a
protein with multiple membrane crossings, such as the nAChR, the
internal transmembrane segments may be isolated from the lipid
environment. In addition, some hydrophobic stretches can be external to
the membrane (in close proximity to, or embedded in, the core protein).
As a consequence, some transmembrane segments were not correctly
predicted. For instance, for the rat glycine
1 subunit, the program
SOSUI (Hirokawa et al., 1998
), based on amino acid physical properties,
did not predict the MII and MIV segments as transmembrane units, nor
did the program TMpred (Hofman and Stoffel, 1993
), based on the
comparison with a database of known transmembrane segments.
The original predictions vary from one author to another (Fig. 7). The membrane-spanning position is set by PHDhtm with 95% accuracy. Such a precision is superior to the original variations.
|
Some structural prediction have already been made for the transmembrane
domain based on analogy arguments. Unwin (1993b)
suggested, on the
basis of his images from electron microscopy, that the transmembrane
region of the nAChR could have a folding similar to that of some
pentameric enterotoxin domains. Ortells and Lunt (1996)
further
exploited this idea to model part of the LGIC transmembrane region
based on the crystallographic structure of the Escherichia coli heat-labile enterotoxin domain B (Sixma et al., 1993
). The resulting model presents a mixed
/
secondary structure, where MII
is all-
, MI is all-
, and MIII is
/
(Fig. 6 B),
the MIII
-helical region being a posteriori added to the template.
Several remarks can be make about this study apart from the fact that the template was never found, up to now, by any threading algorithm. First of all, the enterotoxin is not an integral membrane protein, and
thus may not be an adequate template for the nAChR transmembrane domain. Ortells and Lunt (1996)
removed the first strand that interacts
with the fifth. The resulting template might then be less stable, one
sheet being composed of only two antiparallel strands. As stated by the
authors, the further addition of MIII and MIV, modeled as helices
(partially for MIII) may result in a segregation of the
enterotoxin-modeled moiety from the lipids. The secondary structure
predictions presented here do not agree with those proposed by Ortells
and Lunt (1996
; Fig. 6 B). A three-state comparison between
this study and the present prediction gives only 33% of identical residues.
The nicotinic ligand-binding site
The 2D representation accounts for the basic information concerning the ligand binding site for ACh and competitive antagonists. Secondary structure predictions suggest that whereas binding elements B, F, and C are carried by segments without regular structural patterns, binding elements A, E, and D are at least partially carried by structured segments.
At the level of the complementary component, the affinity-labeled
W57 (AL1104) is located near the center of helix B. Mutations at position AL1106 have been shown to modulate
agonist and antagonist pharmacology (Chiara and Cohen, 1997
; Corringer
et al., 1995
; Harvey and Luetje, 1996
). The side chains of residues
AL1102 and AL1104, however, point outward from
opposite faces of the helix, implying that AL1104 mediates
its effects indirectly, possibly through local alteration of the
structure. At the level of element E, two successive
-strands are
predicted, E5 carrying the identified mouse
S111
(AL1161) (Sine et al., 1995
), and E6 carrying
mouse
Y117 (AL1175). One possibility could be that these
-strands interact in an antiparallel
-sheet, which would direct
the side chains of these binding residues in the same direction and in close proximity. Finally, the segment carrying element F has been shown
to contain the calcium binding site involved in agonist potentiation.
The predicted arrangement of this region without a regular structure is
consistent with the notion that the segment AL1234-AL1245 folds into a specific pocket
that constitutes a calcium binding site, as observed for the
corresponding synthetic peptide (Galzi et al., 1996
).
At the level of the principal component, mutagenesis experiments have
shown that several mutations, located at the vicinity of labeled
residues from elements B and C, profoundly altered the pharmacological
properties (regions AL1210-AL1214 and
AL1256-AL1259) (Corringer et al., 1998
). Since
the entire corresponding regions are predicted to lack a regular
secondary structure, they may fold into loops, such that the mutations
could possibly alter agonist binding indirectly, through structural
reorganization of these putatively flexible segments.
Transmembrane segments as an
/
structure
Each transmembrane segment of the receptor is predicted to fold in
a mixed
/
structure. This prediction should be taken with extreme
caution, since, as noted above, the programs used were not designed to
work on membrane proteins. Prediction methods based on analyses of
globular proteins could incorrectly predict strands in helical
transmembrane regions.
Direct transitions are seen at the end of MI, MIII, and MIV. Such
transitions are impossible following a helix of more than four
residues. Due to the low reliability of the predictions in these
regions, a small hinge could in fact link the
-helices and the
following
-strands.
Also, affinity labeling experiments with a radioactive hydrophobic
probe support an organization of the MIII and MIV transmembrane segments in
-helix (Blanton and Cohen, 1994
). MIII was predicted to
be
-helical until AL1362 (
7F283), while the
HE was predicted to reach only AL1355
(
7S276), and MIV was predicted to be
-helical until
AL1668 (
7I463), whereas HH is predicted to
reach only AL1657 (
7F452).
At the level of the MII segment, known to face the lumen of the ion
channel, our predictions could lead to a reconsideration of the
currently accepted architecture of the ion pathway. MII is predicted
here to start at the level of amino acid AL1323, four
residues after the standard model. In addition, the MII helix is
predicted to be slightly shorter. Much of the data coming from affinity
labeling and site-directed mutagenesis experiments are readily
represented by a helical structure (Akabas et al., 1994
; Revah et al.,
1990
). However, recent results (Wilson and Karlin, 1998
) support an
elongated strand for the short segment-spanning residues
AL1310 (
7S335) to AL1319 (
7S240).
Moreover, it is thought that MI and MII are in close proximity (Akabas
and Karlin, 1995
). Consequently, the cytoplasmic portion linking MI and
MII is predicted to be longer, and could fold into a
-hairpin
(E13-E14), the length of loop linking the
strands being variable according to the subunit. Recent mutagenesis
experiments from this laboratory point to a major contribution of the
center of this cytoplasmic portion to the selectivity filter of the ion
channel. Furthermore, it was found that its conformation, rather than
its precise amino acid sequence, had a critical effect on the
selectivity properties of the ion channel (Corringer et al., 1999
).
This large cytoplasmic region could thus fold in such a way that some
carbonyl of the peptide backbone would be exposed in the correct
geometry for dehydration of specific ions, as observed in the case of a
bacterial potassium channel (Doyle et al., 1998
).
The cytoplasmic portion and the oligomerization
HF and HG are predicted to be amphipathic,
with one face exposed to the solvent and the other buried. The maximum
hydrophobic moment (as determined with the program MOMENT of the
Wisconsin Package (Devereux et al., 1984
) with a window of eight
residues) is 0.19 for HF (low) and 0.57 for HG
(high). In addition, both helices present a leucine-zipper signature
(on 79 sequences: at AL1393, 61L; AL1400, 62L,
14M; AL1611, 30L, 6M; AL1618, only 2L, but 21I
and a conserved hydrophobic position in AL1615;
AL1625, 17L, 30M, and a conserved hydrophobic position in
AL1622). These two cytoplasmic helices could interact in a
coiled-coil arrangement, within the subunit or even between subunits.
This motif could be critical for the oligomerization process. Indeed,
Yu and Hall (1994)
have demonstrated that two deletions of amino acids
belonging to HF and HG imply their intervention
in the formation of the pentamer.