| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Biophys J, August 1999, p. 775-788, Vol. 77, No. 2
*Laboratory of Experimental and Computational Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892-5677 USA; #Laboratory of Membrane Biochemistry, Faculty of Pharmaceutical Sciences, Chiba University, Inage-ku, Chiba 263, Japan; and §Abteilung Mikrobiologie, Universität Osnabrück, D-49069 Osnabrück, Germany
| |
ABSTRACT |
|---|
|
|
|---|
The hypothesis is presented that at least four families of putative K+ symporter proteins, Trk and KtrAB from prokaryotes, Trk1,2 from fungi, and HKT1 from wheat, evolved from bacterial K+ channel proteins. Details of this hypothesis are organized around the recently determined crystal structure of a bacterial K+ channel: i.e., KcsA from Streptomyces lividans. Each of the four identical subunits of this channel has two fully transmembrane helices (designated M1 and M2), plus an intervening hairpin segment that determines the ion selectivity (designated P). The symporter sequences appear to contain four sequential M1-P-M2 motifs (MPM), which are likely to have arisen from gene duplication and fusion of the single MPM motif of a bacterial K+ channel subunit. The homology of MPM motifs is supported by a statistical comparison of the numerical profiles derived from multiple sequence alignments formed for each protein family. Furthermore, these quantitative results indicate that the KtrAB family of symporters has remained closest to the single-MPM ancestor protein. Strong sequence evidence is also found for homology between the cytoplasmic C-terminus of numerous bacterial K+ channels and the cytoplasm-resident TrkA and KtrA subunits of the Trk and KtrAB symporters, which in turn are homologous to known dinucleotide-binding domains of other proteins. The case for homology between bacterial K+ channels and the four families of K+ symporters is further supported by the accompanying manuscript, in which the patterns of residue conservation are demonstrated to be similar to each other and consistent with the known 3D structure of the KcsA K+ channel.
| |
INTRODUCTION |
|---|
|
|
|---|
Regulation of ion gradients across the plasma
membrane is a requirement of all living cells. Much of this is
accomplished by membrane channel proteins that allow ions to diffuse
passively down their electrochemical gradients, and by membrane
transport proteins that use energy to transport ions actively against
their electrochemical gradients. There have been numerous suggestions that in some active ion transporters, ions may diffuse most of the way
across the membrane through a "pore" (Jardetzky, 1966
; Lauger,
1979
; Su et al., 1996
). This hypothesis has gained support from
findings that several proteins that are homologous to transporters act
as channels: the Cystic Fibrosis Conductance Regulator (CFTR) is a
Cl
ion channel, even though its primary sequence is
homologous to the ABC superfamily of transporters (Anderson et al.,
1991
); the glutamate transporters (Larsson et al., 1996
) and
norepinephrine transporters (Galli et al., 1998
) apparently act as
channels under some conditions; the Kef family of bacterial
K+ channels (Booth et al., 1996
) is homologous to the NapA
Na+/H+ family of antiporters (Reizer et al.,
1992
); and the Kir inward rectifying K+ channel associates
with a Sur protein that is homologous to the ABC transporters (Ashcroft
and Gibble, 1998
). Symporters are transporters in which the transport
of one ion or molecule against its electrochemical gradient is
"powered" by the movement of another ion or molecule down its
electrochemical gradient in the same direction through the membrane.
Plausible mechanisms for symport, in which even the actively
transported ion diffuses most of the way through the transmembrane
protein, are discussed in the accompanying manuscript (Durell and Guy,
1999
).
This report provides indirect evidence, from analysis of the sequences,
for homology and common structural features between the superfamily of
K+ channel proteins and four K+ symporter
protein families. The four symporter families are 1) the
K+-translocating TrkH subunit from the Trk systems of both
bacteria and archaea (Schlösser et al., 1991
, 1995
; Stumpe et
al., 1996
), 2) the KtrB subunit from a recently described KtrAB system
in eubacteria (Nakamura et al., 1998b
) (previously identified as NtpJ
by Takase et al., 1994
, and Clayton et al., 1997
), 3) the Trk1,2
proteins from yeasts and Neurospora (Gaber et al., 1988
; Ko
and Gaber, 1991
; Lichtenberg-Fraté et al., 1996
; Haro et al., 1999
) and 4) the HKT1 protein from wheat (Schachtman and Schroeder, 1994
; Wang et al., 1998
) and a homologue from Arabidopsis
(Washington University Genome Sequencing Center, 1998 [The A. thaliana Genome Sequencing Project,
http://genome.wustl.edu/gsc/arab/arabidopsis.html]; Bevan et al., 1999 [EU Arabidopsis sequence project, unpublished; accession
no. CAB39784]). For the purpose of this analysis, the fungi and plant
symporters are grouped into a single eukaryotic family called Trk-euk.
The current supposition is that functional Trk-euk proteins are formed
from a single type of subunit, although the structural similarities
outlined below may force some reconsideration. HKT1 symport in wheat is
dependent on Na+ (Rubio et al., 1995
; Diatloff et al.,
1998
), and TKHp symport in the fission yeast
Schizosaccharomyces pombe (which is closely related to the
budding yeast Trk1,2 system) is dependent on H+
(Lichtenberg-Fraté et al., 1996
) (although a possible role for Na+ has not been excluded). In comparison, the functional
forms of the bacterial Trk and KtrAB systems are clearly more
structurally complex; both comprise multiple subunit types (Stumpe et
al., 1996
; Nakamura et al., 1998
). Trk cotransports H+ with
K+ (Stumpe et al., 1996
), whereas KtrAB is Na+
linked (Tholema et al., 1999
).
Additional evidence of the homology between these symporter and channel proteins comes from the development of 3D atomic-scale models of the transmembrane regions, which is presented in the accompanying paper. Specifically, it is found that the pattern of amino acid residue conservation within each symporter family is consistent with the structural fold and ion-selective mechanism employed by the superfamily of K+ channel proteins.
The ability to compare the symporter and channel proteins is now
greatly enhanced by the recently determined crystal structure of the
transmembrane component of the KcsA K+ channel from
Streptomyces lividans (Doyle et al., 1998
), which certifies
the basic structural and functional roles of the different channel
segments. Perhaps most importantly, this has confirmed the role of the
P segment in forming the outer portion of the pore and the ion
selectivity filter, which was previously predicted by indirect
theoretical and experimental methods (see accompanying paper for
details). Specifically, the four P segments (one from each of the four
channel subunits) are arranged with fourfold symmetry around the axis
of the pore, with each in the same hairpin conformation and dipping
into the outer portion of the transmembrane region from the
extracellular side. The first arm of the hairpin (P1) is an
-helix
that slants toward the center of the channel, and the second arm (P2)
is an extended
-structure (the backbone alternates between right-
and left-handed
-helix conformations; Guy and Durell, 1995
) that
rises out of the channel along the axis. Collectively, the four P2
segments form the narrowest portion of the pore, which consequently
acts as the selectivity filter. The K+ binding sites are
formed by the backbone carbonyl oxygen atoms of conserved "signature
sequence" residues of the four P2 segments. The full P-segment
hairpin (P1 + P2) is located between two hydrophobic transmembrane
helices (M1 and M2) that together form the MPM (or 2TM) motif. This
contrasts with the 6TM motif in many other types of K+
channels (e.g., the voltage-gated Shaker channel protein),
in which the MPM structure is preceded by four additional hydrophobic transmembrane segments (Uozumi et al., 1998
; Shih and Goldin, 1997
).
The first hint of homology between symporter and channel proteins came
from the sequence analysis work of Jan and Jan (1994)
, who postulated
that TrkH has two P-like segments similar to those of K+
channels. This led Stumpe et al. (1996)
to propose a transmembrane topology for TrkH that contained a MPM motif at both the N- and C-terminal ends of the transmembrane region of the sequence. While searching the databases for possible bacterial K+ channels,
the group of Guy found that some specific MPM channel sequences were
actually more similar to portions of some K+ symporters
than to other K+ channel proteins (see Fig.
1). Surprisingly, the matching portion in
these symporter sequences was not at the P regions identified by the
Jan and Jan group, but rather at an intermediate location. As described
below, further sequence analyses led the Guy and Nakamura-Bakker groups
independently to the notion that these symporters actually comprise
four sequential MPM motifs (designated MPMA,
MPMB, MPMC, and MPMD).
|
This arrangement of primary structure suggests the process of gene
duplication, similar to the evolutionary schemes deduced for related
Na+, Ca+2, and some K+ channel
proteins. For example, the TWIK (or 2 × 2TM) type of K+ channels have two MPM motifs within each of two
identical subunits (Lesage et al., 1996
), the yeast TOK (or DUK1)
channel subunit has a 6TM motif followed by an MPM motif (Ketchum et
al., 1995
; Reid et al., 1996
), and both Na+ and
Ca+2 channels have four consecutive 6TM motifs within their
primary pore-forming subunits (Noda et al., 1984
). Finally, the
hypothesis of homology between the channel and symporter proteins is
also supported by sequence similarity between the cytoplasmic domain of
many of the bacterial K+ channels and the 120-residue
NAD-binding domains in cytoplasmic subunits of the Trk and KtrAB
symporter complexes, e.g., TrkA and KtrA (Schlösser et al., 1993
;
Nakamura et al., 1998
).
| |
METHODS |
|---|
|
|
|---|
Sequence acquisition and alignments
The four families of homologous bacterial K+ channel
and symporter sequences were obtained by a combination of motif and
keyword searches of the NCBI's Genbank and microbial databases (see
NCBI BLAST: Unfinished Microbial Genomes
(http://www.ncbi.nlm.nih.gov/BLAST/ unfinishedgenome.html) and NCBI
PSI-BLAST
(http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-psi blast)). The
motif searches were carried out using the ion-selective, P-segment of
K+ channels as the seed for gapped BLAST and PSI-BLAST
procedures (Altschul et al., 1997
). The resultant multiple sequence
alignments of MPM motifs were then manually adjusted to emphasize the
common features among all proteins. This involved matching features
within each protein family locally and between the four families
globally. Because of variability within the loop regions, our primary
concern was alignment of the three main segments (i.e., M1, P, and M2) with as few gaps as possible. In sum, the data consisted of 13 multiple
sequence alignments of MPM motifs
one from the K+ channels
and four from each of the three symporter families
each containing
three subalignments corresponding to the M1, P, and M2 segments.
Quantification of homology
To quantify the similarity among motifs, each multiple sequence
alignment was converted into a numerical profile matrix. This was
carried out according to the methods described by Henikoff and Henikoff
(1996)
for creating a log-odds position-specific scoring matrix (PSSM).
These procedures estimate the residue frequencies at each position for
the entire population of related proteins in nature from the limited
and nonrandomly sampled set of known sequences used in the alignments.
Briefly, the steps were 1) weighting the observed counts of each
residue by the calculated redundancy of the parent sequence in the
multiple sequence alignment (Henikoff and Henikoff, 1994
), 2) adding
"imaginary" pseudo-counts to the sequence-weighted counts according
to the residue diversity at each location and empirically determined
residue substitution probabilities (BLOSSUM; Henikoff and Henikoff,
1992
), 3) normalizing these composite counts by the expected frequency
of occurrence of the specific residue (estimated from the amino acid
composition of the Swiss-Prot sequence database; Bairoch and Apweiler,
1998
), and 4) taking the logarithm of these normalized counts to obtain the PSSM score for each of the 20 residues at each location. Throughout this analysis, effort was directed toward determining the sensitivity to the multiplication factor used for the total number of
pseudo-counts, which determines the relative proportion to the weighted
sequence counts of the alignments. Because the effect on the final
results was minimal, the recommended value of 5 was used (Henikoff and Henikoff, 1996
).
Quantification of the similarity of each pair of PSSMs of the same
segment type was performed according to the methods of Pietrokovski
(1996)
. This entailed calculating the Pearson's correlation coefficient for each pair of aligned profile columns and then adding
the coefficients to obtain the total raw score. For the purpose of
comparison, the raw scores were converted into Z scores, which is the number of standard deviations it is away from the mean of
a distribution of best raw scores obtained by chance for unrelated
protein families. Because there is a dependence of the raw score on the
length of the segment, it is necessary to have a series of best chance
score distributions corresponding to each possible segment length. Such
distributions were calculated by full enumeration of every possible
pair of the 3670 PSSMs of multiply aligned sequences in the Blocks 10.1 database (Henikoff and Henikoff, 1991
), which resulted in over 6.7 million chance scores. The database was previously modified by the
removal of compositionally redundant blocks, and the sequence columns
of one of each pair of PSSMs was randomly shuffled to eliminate bias in
the results (Pietrokovski, 1996
). Multiple collections of chance
distributions were calculated for each set of trial PSSM creation
parameters used to examine the sensitivity of the results (described
above). Finally, assuming the distributions to be normal, the
probability that a particular score would occur by chance was
determined from the definite integral of the Gaussian probability
distribution (Bevington, 1969
). For example, the probability of
obtaining Z scores of 2, 3, 4, 5, and 6 for unrelated
protein families would be approximately 5, 3 × 10
1,
6 × 10
3, 6 × 10
5, and 4 × 10
9%, respectively.
To provide a reference in the context of membrane proteins, comparisons of the bacterial 2TM channels and symporters were also made with the transmembrane segment blocks of 19 bacteriorhodopsin homologues and other ion channel proteins. Whereas the bacteriorhodopsin sequences were taken to be evolutionarily unrelated, the channel proteins, which included the TWIK family from C. elegens, IRK family from eukaryotes, and Na+ channel family were expected to have various degrees of homology. In the calculations, full enumeration was used to find the contiguous segment of at least four residues with the highest Z score. The only exception was for comparison of the bacterial 2TM channel and symporter families themselves, for which the relative overall alignments of the blocks were kept the same as shown in Fig. 2, A and B. Within this restriction, enumeration was again used to find the highest Z-scoring subsegment of at least four residues.
|
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Sequence alignment
Fig. 2, A and B, displays the global alignment of all of the MPM motifs used to study the evolutionary relationships within and between the four families of K+ channels and symporters. Although only consensus sequences are used in the figure for clarity, all analyses were conducted on the full set of multiply aligned sequences (see the Appendix for the list). The 13 consensus sequences correspond to the single MPM motif in the channels and the four MPM motifs from each of the three symporter families. In all parts of the figure, the spectrum from red to blue/black represents the range of residues from conserved to variable. Fig. 2 A presents the global pattern of conservation among the 13 consensus sequences, and Fig. 2 B presents the local pattern of conservation among the sequences used to generate each of the consensus sequences. As seen in Fig. 2 A, the alignments of the P and M2 segments were keyed to the highly conserved residues shown in red and orange. In contrast, alignment of the M1 segments was more difficult because of the lack of global conservation. Interestingly, the variable and highly hydrophobic nature of this segment in both the channel and symporter sequences is consistent with its position in the KcsA crystal structure, i.e., the four M1 helices are on the periphery of the protein, they are largely lipid exposed, and they do not directly form the pore structure. Consequently, these segments were initially aligned by simply matching the hydrophobic regions with as few insertions or deletions as possible. Then, as seen in Fig. 2 B, finer adjustments were made to align the conserved residues within each family.
Subsequently, special emphasis was given to the latter portion of the M1 segment, because this region in the KcsA structure packs closely to the crucial P segments in the crystal. For the 2TM bacterial K+ channels, the two most highly conserved residues in the C-terminal half of M1 are a glycine located nine residues before the end and a glutamate located right at the end (see Fig. 2 B). Thus most M1 segments of the symporters were aligned so that a small residue, usually glycine, coincides with the channel glycine. In MPMC and MPMD of KtrB and Trk-euk, a glutamate or aspartate at the end of M1 aligned with the highly conserved channel glutamate. In the few cases where these criteria were insufficient, the more highly conserved and/or more hydrophilic symporter residues in M1 were aligned with the more highly conserved channel residues that are oriented toward the protein and away from the lipid in the KcsA structure.
As indicated by the single red column in Fig. 2, A and
B, the most highly conserved residue among the channel and
symporter sequences is a glycine in the P region (the only
exception being a serine substitution in the MPMA of
the Arabidopsis protein). This provides an important
evolutionary link between the protein families, in that this residue is
known to play a major role in determining the ion selectivity for many
classes of K+ channel proteins. For example, mutagenesis
studies have found that this is the only residue in the Shaker
K+ channel P segment that cannot be mutated to Cys in even
one of the four subunits without loss of function (Lü and Miller,
1995
). This can be explained by the findings in the KcsA structure that the backbone conformation of this glycine is energetically unfavorable for other types of residues and that the four backbone carbonyl oxygen
atoms of the glycine residue
one from each of the four subunits
form
an ion-binding site at the narrowest portion of the pore. Indeed, the
functional significance of this glycine is further emphasized by the
fact that this is the only residue that is identical among the set of
27 putative 2TM bacterial K+ channels (Fig. 2
B).
Next, the reddish orange columns in Fig. 2 A denote single
residues, of the symporter P1 and M2 segments, that are identical among
the consensus sequences in all but two locations. In P1 the
phenylalanine aligns with the tyrosine (similarly aromatic) of the
K+ channel consensus sequence. In the KcsA structure, that
residue is a tryptophan, which combines with an adjacent P1 tryptophan and a P2 tyrosine in each of the four subunits to form an aromatic cuff
around the selectivity filter (Doyle et al., 1998
). For the M2 segment,
the highly conserved residue is a glycine that is also well conserved
among the channels. Its structural importance is indicted by the fact
that in the KcsA structure it packs next to the innermost part of the P
segment. This site may be important for channel gating, as well as
selectivity, because the inner portion of M2 in the KcsA structure
moves closer to the pore as the channel closes (Perozo et al., 1998
,
1999
).
Other residues that are conserved moderately well are indicated by
letters in black type. These include 1) the threonine-rich region of P1
preceding the fully conserved glycine, which is strongly conserved
among the K+ channels and, to a lesser degree, among the
symporters; 2) a DAL sequence conserved in many of the symporters,
lying just before the highly conserved aromatic P1 residue; 3) the
ILLML consensus sequence preceding the conserved M2-glycine, which is
strongly conserved among the channels and partly conserved among the
symporters; and finally, 4) numerous leucines that appear to be well
conserved in both the M1 and M2 segments. However, such leucine matches cannot be securely interpreted as indicative of homology, because leucine is the most frequently occurring residue in the hydrophobic regions of transmembrane helices (Hofmann and Stoffel, 1993
). Note in
Fig. 2 B, for example, that many of the M1 leucines are not
well conserved within each family.
Conservation pattern
Related to the conservation of specific residue types, homologous relationships are also indicated by the similar patterns of residue conservation for each of the protein families. This is demonstrated in Fig. 2 B, in which the consensus sequences are color coded according to the degree of conservation within each family (i.e., among the sequences used to develop the consensus sequences), or, in the case of the red and orange colors used for the symporters, by the similarity of the consensus sequences among the three families of symporters (see legend for code).
Red to orange denotes residues that are conserved in two or more families.
Yellow to green indicates residues that are well conserved within the family, but not conserved between the families.
Blue to black represents residues that are poorly conserved within the family.
As seen, the same general pattern of sequence conservation is repeated within each MPM core of the bacterial K+ channels and the three symporter families. More specifically, the poorly conserved M1 segments are followed by highly conserved P segments, which are followed by the center-region-conserved M2 segments. Furthermore, all linkers between segments are poorly conserved, with numerous insertions and deletions. Close inspection reveals that most of the globally conserved residues identified in Fig. 2 A are located within the regions of local conservation.
A separate comparison is also made in Fig. 2 B for the two plant symporters (wheat and Arabidopsis), which are colored according to the conservation between them and in relation to the fungal sequences (see legend for code).
Nucleotide binding domains and subunits
Another line of evidence supporting homology among these proteins
involves the separate subunits of the KtrAB and Trk symporters that
contain dinucleotide-binding domains: i.e., KtrA and TrkA. These are
peripheral membrane proteins (Bossemeyer et al., 1989
; Nakamura et al.,
1998
), which are probably located at the cytoplasmic side of the
membrane. The KtrA subunit is homologous to the dinucleotide-binding site sequences of many other proteins and combines with the
transmembrane KtrB protein to form a functional symporter. The TrkA
subunit, however, is more complicated. Except for three archaeal TrkA
species that also contain only one dinucleotide-binding domain, all
other TrkAs have two dinucleotide-binding sites contained in each of two similar subdomains (Stumpe et al., 1996
; Nakamura et al., 1998a
;
Kawarabayasi et al., 1998
). In addition, TrkA interacts with multiple
protein subunits, in addition to TrkH, to form the functional symporter
(Dosch et al., 1991
; Parra-Lopez et al., 1994
; Stumpe et al., 1996
;
Nakamura et al., 1998a
). It must be noted, however, that only the
E. coli TrkA protein has actually been demonstrated to bind
NAD+ and/or NADH in vitro (Schlösser et al., 1993
).
In addition, it is not yet known whether dinucleotides influence the
transport activities of the proteins in vivo.
Database searches indicated that the closest sequences to the KtrA and
TrkA subunits are C-terminal portions of some 2TM bacterial K+ channels and of some members of the Kef family of
K+ channels (Munro et al., 1991
; Stumpe et al., 1996
). The
close homology between these sequences is evident in the alignment of representative samples shown in Fig. 2 C, in which there is
44% identity over a stretch of ~120 residues. This is the longest segment for which the sequences can be aligned unambiguously. It
contains only one complete dinucleotide-binding domain, which appears
to be a chimera between the N-terminal segment of the NAD+-binding domain of malate/lactate dehydrogenase-like
proteins and the C-terminal segment of the NAD+-binding
domain from the glyceraldehyde-3-phosphate dehydrogenase-like proteins
(Schlösser et al., 1993
; Stumpe et al., 1996
). Such findings
suggest that the small KtrA and TrkA subunits may have derived from the
cleavage of a covalently attached C-terminal region of an ancestral 2TM
K+ channel.
Although most eukaryotic and some bacterial K+ channels
(including the KcsA protein) lack an intrinsic dinucleotide-binding domain, various other K+ channels are found to have more
distantly homologous sequences at the C-termini. These include some
putative bacterial channels of the 6TM type (e.g., Kch from E. coli; Parra-Lopez et al., 1994
), the high-conductance
Slo-type channel from animal cells (Parra-Lopez et al.,
1994
; Stumpe et al., 1996
), and the newly identified channel-like sequence from Aquifex aeolicus (Deckert et al., 1998
).
Moreover, proper
-subunits from a variety of plant and animal
K+ channels have redox function; and some, such as the
Shaker K+ channel
-subunit, align nicely with
the eight-stranded
-barrel structure of NAD(P)H-dependent
oxidoreductases (McCormack and McCormack, 1994
; Jan and Jan, 1997
).
Statistical analysis
The statistical analysis was intended to determine the following: the degree of homology among 1) the three symporter families, 2) the bacterial 2TM channels and the symporters, and 3) the four MPM motifs of each symporter family. As described in the Methods, the results are given as Z scores, which are the number of standard deviations the raw score is from the mean of best chance alignments for segments of the same length. The greater the Z score, the more similar the sequence profiles are, and the less likely the alignment is to occur by chance. As also described, the alignment of the bacterial 2TM channel and symporter motif blocks was the same as represented by the consensus sequences in Fig. 2, and the reported score is the highest of all possible subsegments of at least four contiguous residues. Furthermore, the linkers in each motif have been excluded because of their extreme variability, leaving the three primary segments (i.e., M1, P, and M2) to be treated individually.
For interpretation of these results it is important to consider that
membrane proteins share some basic properties independent of their
evolutionary relationships. For example, our experience with this
methodology suggests that comparison of any two transmembrane segments
in which nonpolar residues predominate will result in a positive
similarity score. Thus, to determine a baseline control of this effect,
each segment block of the bacterial 2TM channels and symporters was
compared to each of the seven transmembrane segments of an alignment of
19 bacteriorhodopsin homologs (Horn et al., 1998
). This latter family
was judged an ideal membrane protein control for the following reasons:
1) they are bacterial proteins, 2) the structure of one member of the
family is known (i.e., bacteriorhodopsin), 3) they lack P segments and
are unrelated to K+ channels, 4) they have multiple
transmembrane segments, 5) there are numerous homologs, and 6) the
transmembrane segments of the homologs can be aligned with little ambiguity.
The comparisons of the three symporter families, i.e., KtrB, TrkH, and Trk-euk, are shown in Table 1. Only motifs at the same positions in the sequences were compared, rather than considering all possible cross-terms of motifs from different gene duplications. A general measure of the similarity for each of the three family comparisons is obtained by simply taking the average of the 12 scores of each group. This results in the similar average Z scores of 7.6 and 7.9 for the KtrB versus TrkH and Trk-euk comparisons, respectively, and the relatively low value of 5.0 for the TrkH versus Trk-euk comparison. Considering that the average for all of the comparisons with bacteriorhodopsin is 3.1, these results indicate statistically significant sequence similarities among almost all of the corresponding segments of the symporters. Furthermore, among the symporters, the fact that the Z scores are consistently lowest for the TrkH versus Trk-euk comparison supports the hypothesis that the KtrB family is more like the presumed common ancestor.
|
At greater detail, it is interesting that in some instances the degrees of similarity for the three families depend upon which motif segments are being compared. For example, when the KtrB family is compared to the Trk-euk family, the four P segments are conserved substantially better than are the other two segments. (This pattern is similar to that found when different families of K+ channels are compared as shown in Table 2.) In contrast, when the KtrB family is compared to the TrkH family, most of the M2 segments are conserved to a greater extent than are the P segments. When related to the three-dimensional structure of KcsA, this indicates that the structures of the Trk-euk proteins are more similar to the KtrB proteins in the outer half of the transmembrane region (where the pore is formed by the P segments), and the structures of the TrkH proteins are more similar to the KtrB proteins at the inner half of the transmembrane region (where the pore is formed by the M2 segments). Overall, the M1 segments are found to have the least degree of conservation; the average of the four scores is 6.2 and 6.0 for the KtrB versus TrkH and Trk-euk comparisons, respectively, and 4.5 for the TrkH versus Trk-euk comparison. Again, this is consistent with the lesser structural role the M1 segment plays in forming the pore in the KcsA crystal structure.
|
Table 2 shows the results when the M1, P, and M2 segments of the bacterial 2TM K+ channels are compared with the proposed analogous segments of the symporters. These results support our hypotheses that the four putative MPM motifs of the symporters are related to the MPM motif of the bacterial K+ channels and that the KtrB family is closest to the presumed ancestor. Specifically, three of the four MPM motifs of the KtrB symporters are found closest to the single MPM motif of the channels, scoring substantially higher (at least 2.0 points) than the control comparisons with the bacteriorodopsin segments. The exception is the MPMB motif, which instead scores highest for the TrkH family. As is expected for the shift from prokaryotic to eukaryotic species, the Trk-euk family is clearly the most distant from the bacterial K+ channels, with only one segment scoring more than two points higher than the control. It is also seen that in general the evolutionary distance between the channels and symporters is larger than among the three symporter families themselves (Table 1). Using a simple measure, the 12-score averages in Table 2 for the similarities between the channels and symporters are 6.0, 5.4, and 4.5 for the KtrB, TrkH, and Trk-euk families, respectively. The only exception is the score for the TrkH versus Trk-euk symporter families (i.e., 5.0), which indicates a greater distance than that between the channels and the KtrB and TrkH families.
To provide further insight into the calculated evolutionary distances,
the bacterial 2TM K+ channel sequences were compared to
three other ion channel families. These were 1) the relatively similar
TWIK or 2 × 2TM family of K+ channels from C. elegans (which has two consecutive MPM motifs per subunit), 2) the
more distantly related IRK K+ channel family from
eukaryotes, and 3) the homologous S5-P-S6 regions of the
Na+ channel family (which have P segments selective for
Na+ instead of K+). As expected, the scores of
the P segments of the 2 × 2TM K+ channels were
significantly closer to those of the bacterial 2TM channels than
were those of the symporters; however, the scores for the M1 and
M2 segments were about the same as for those of the KtrB family.
Surprisingly, for the IRK family the M1 and M2 scores were about two
points lower than the averages for the KtrB symporters, and the score
for the ion-selective P segments was only slightly higher (i.e., 0.6 and 0.3 greater than the averages for the KtrB and TrkH families).
Moreover, the scores for the analogous regions of the four motifs of
the Na+ channel family were on average no greater than
those for the unrelated bacteriorhodopsin family. Despite the
difference in P-segment ion selectivity, this is somewhat surprising,
because the voltage-gated Na+ channels are thought to have
evolved from voltage-gated Ca+2 channels, which in turn are
thought to have evolved from voltage-gated K+ channels
(Strong et al. 1993
). Thus the finding that the KtrB and TrkH families
score substantially higher than do the distantly related IRK and
Na+ channel families supports the hypothesis that the
symporter and bacterial 2TM channel families are homologous.
Table 3 displays the calculated similarities of the four MPM motifs within each of the three symporter families individually. The 18-score averages from Table 3 are 6.8, 5.2, and 4.4 for the KtrB, TrkH, and Trk-euk families, respectively. Thus comparison with Table 2 indicates that the four symporter MPM motifs are almost as similar to the bacterial 2TM K+ channel MPM motifs as they are to each other. For example, the average score from Table 3 is 0.8 greater than that from Table 2 for the KtrB family, but is 0.2 and 0.1 smaller for the TrkH and Trk-euk families. As can be seen by the pattern of bold numbers, all of the KtrB segments score substantially higher to each other than to the bacteriorhodopsin controls. This strongly supports the premise that the four MPM motifs are indeed homologous and are likely due to gene duplications. Although Table 3 indicates less similarity for the M1 and M2 segments of the other two symporter families, the strong case for mutual homology with KtrB seen in Table 1 supports the extension of this conclusion to the TrkH and Trk-euk proteins. In addition, the fact that the majority of the scores in Table 3 are highest for the KtrB family (15 of 18) is consistent with the hypothesis that this family is the closest to the common ancestor, because it indicates the least divergence of the four gene repeats. Likewise, the finding that 12 of the 18 scores are lowest for the eukaryotic Trk-euk family is consistent with it being the most divergent from the prokaryotic progenitor. Unfortunately, the pattern of conservation is not clear enough to predict the order of the motif duplications. That is, the pattern of high and low scores is not uniform among the three segments of the MPM motifs, nor is it uniform for the three families. For example, MPMA and MPMD are most similar in the KtrB family, but are the least similar for the TrkH and Trk-euk families.
|
Evolutionary tree
Based on this analysis of the sequences, an evolutionary relationship between the different channel and symporter families is deduced as shown in Fig. 3. Specifically, a single prototype MPM transmembrane motif (left) underwent a fourfold gene duplication and gene fusion to form a K+ symporter protein ancestor (center). Furthermore, the cytoplasmic dinucleotide domain of the K+ channel ancestor may have split off to form a separate dinucleotide-binding subunit that associates with the symporters. Most members of the KtrB family (right) of eubacteria have remained similar to this ancestral protein. However, KtrB's from two Mycoplasma species contain additional extracellular domains between the M1 and P1 segments of the first three MPM motifs, and KtrB (NtpJ) from Trepanoma pallidum contains two additional transmembrane domains preceding the intracellular N-terminus (not shown). The TrkH family (top) in bacteria and archaebacteria, which also has two additional transmembrane helices at the N-terminal (unique and different from those in T. pallidum KtrB), has diverged more than have most members of the KtrB family. The TrkA subunit probably underwent an internal gene duplication to produce two dinucleotide-binding domains. The Trk1,2 family in fungi (bottom) has diverged even more. Its members have an extra long cytoplasmic loop between MPMA and MPMB, and a smaller, linker-like insert between MPMC and MPMD. The two plant sequences (bottom right) are only slightly closer to the Trk1,2 sequences than to KtrB and should probably be considered a separate family. At present, the eukaryotic symporters are still not known to have a dinucleotide-binding subunit.
|
| |
CONCLUSIONS |
|---|
|
|
|---|
Paleontologists often search for evidence of links between
distantly related groups of organisms. For example, the discovery of a
subgroup family of dinosaurs that have feathers can establish the
evolutionary link with modern-day birds (Ji et al., 1998
). Although
there is no fossil record for molecular evolution, a similar method can
be used to establish links of distantly related proteins: i.e., by
determining subgroups that have intermediary sequences, structures,
and/or functions. In this and the accompanying paper, it is argued that
the bacterial KtrAB and 2TM K+ channel protein families
serve such a function, in that they link the K+
channels with the distantly related K+ symporter proteins.
Although we believe the sequence comparison and model building methods
presented in the accompanying and present papers can be generalized
constructively to other protein systems, care must be taken to avoid
certain pitfalls. For example, studies and intuition concur on the
benefit of using profiles of families over individual sequences to
identify the homology of distantly related proteins (Tatusov et al.,
1994
; Henikoff and Henikoff, 1996
). Unfortunately, however, this is not
an automatic procedure. Beyond selection of the specific profiling and
comparison scoring methods, judgment is required in selecting the range
of related sequences that make up each family group. Although it is
obvious that a profile of nearly identical sequences does not contain
much added information, it can also be detrimental to form a profile of
too diverse a grouping (as might occur in a larger superfamily). For
example, comparison of the KtrB symporter and bacterial 2TM
K+ channel profiles convincingly indicates an evolutionary
relationship between these two protein families. However, the results
are considerably more tentative for the Trk-euk symporter family, in
which the scores of the M1 and M2 segments are not very similar to
those of the K+ channels, or even to themselves in the
different MPM motif repeats. Likewise, comparisons of the symporters to
distantly related families of K+ channels, such as the IRK
family, indicate little similarity (data not reported). Thus a profile
that combined all of the symport families and/or that combined all of
the K+ channel families would result in a weaker similarity
score than that of KtrB versus bacterial 2TM K+ channels.
This could lead to the erroneous conclusion that the symporter and
channel proteins are not homologous. Rather, the case for the Trk-euk
symporters being related to the channels comes indirectly through the
strong score similarity that its P segment profiles have with the KtrB
family (Table 1). The observation that the M1-P-M2 segments of
bacterial 2TM K+ channels score no better with
Na+ channel S5-P-S6 segments than they do with
transmembrane segments of bacteriorhodopsin homologs suggests that this
procedure is unable to detect distant homology for protein families in
which the primary functional property (in this case ion selectivity of
P segment) has changed.
It is important to note that there are other shared sequence properties between the bacterial 2TM K+ channel and symporter families indicative of an evolutionary relationship that are not quantified by the calculations presented here. For example, although the statistical analysis strongly suggests that the four MPM motifs of the KtrB symporters are homologous to each other as well as to that of the bacterial 2TM K+ channels, it does not take into account that the three constituent segments (i.e., M1, P, and M2) are always in the same order. Furthermore, no score is provided for the probability of finding the same number of MPM motifs in the symporter sequences as there are single-motif subunits in the channels. Similarly, no quantification is made for finding the similar patterns of residue conservation and polarity among the MPM motifs of the channels and symporters: e.g., the P segments are the most well conserved, whereas the M1 segments are the least well conserved. And finally, the statistical analysis also does not take into account that several highly conserved residues known to be functionally important in the channel proteins (most notably the glycines of the P2 segment responsible for ion selection) are also highly conserved in each of the four MPM motifs of the symporters. In the accompanying paper it is shown how these properties justify building 3D atomic-scale models for the three symporter families in which the four MTM motifs each have the same general fold as the single KcsA K+ channel subunit seen in the crystal structure.
An essential note of caution is that the data used for analysis in this paper are mostly from recently determined nucleic acid sequences. In only a few cases have experiments already been conducted to establish that the encoded proteins are actually expressed and that they have channel or symporter functions as predicted. This is particularly true for the putative 2TM bacterial K+ channels that contain C-terminal dinucleotide-binding domains. At present, there are no published data demonstrating that these specific genes encode functional channels rather than other types of transport proteins.
An important question is whether the transmembrane topology proposed
here carries over to other families of transporters. Unfortunately, the
simple method of constructing a hydropathy plot to predict the
transmembrane topologies of these proteins is not very reliable and is
not designed to identify P segments. To date, the transporter proteins
that have been studied most extensively do not appear to have P
segments. For example, the lactose permease protein has been
experimentally determined to have 12 fully transmembrane segments (Lee
and Manoil, 1996
). Likewise, cryoelectron microscopy studies have
indicated that the H+ (Auer et al., 1998
) and
Ca+2 (Zhang et al., 1998
) P-type pumps each have 10 fully
transmembrane segments. In addition, the Kef proteins appear
to form a different class of bacterial K+ channels that
lack the classic K+ channel P-segment "signature
sequence," but which do appear to have a dinucleotide-binding
C-terminus (Booth et al., 1996
). In fact, their transmembrane sequences
appear to be more similar to those of the NapA
Na+/H+ antiporters than to the channels (Reizer
et al., 1992
).
| |
APPENDIX |
|---|
|
|
|---|
|
|
| |
ACKNOWLEDGMENTS |
|---|
We thank Clifford Slayman for many helpful comments and assistance. Some preliminary sequences sequence data were obtained from the Institute for Genomic Research website at http://www.tigr.org and the NCBI website at http://www.ncbi.nlm.nih.gov/BLAST/unfinishedgenome.html.
The work in Osnabrück was supported by the Deutsche Forschungsgemeinschaft (SFB171) and the Fonds der Chemischen Industrie.
| |
FOOTNOTES |
|---|
Received for publication 8 February 1999 and in final form 3 May 1999.
Address reprint requests to Dr. H. Robert Guy, Laboratory of Experimental and Computational Biology, National Cancer Institute, National Institutes of Health, Bldg. 12B, Rm. B116, 12 South Drive, MSC 5677, Bethesda, MD 20892-5677. Tel.: 301-496-2068; Fax: 301-402-4724; E-mail: guy{at}guy.nci.nih.gov.
| |
REFERENCES |
|---|
|
|
|---|
a database of membrane spanning protein segments.
Biol. Chem. Hoppe-Seyler.
347:166
subunits belong to an NAD(P)H-dependent oxidoreductase superfamily.
Cell.
79:1133-1135[Medline].
Biophys J, August 1999, p. 775-788, Vol. 77, No. 2
© 1999 by the Biophysical Society 0006-3495/99/08/775/14 $2.00
This article has been cited by other articles:
![]() |
R. Takahashi, S. Liu, and T. Takano Cloning and functional comparison of a high-affinity K+ transporter gene PhaHKT1 of salt-tolerant and salt-sensitive reed plants J. Exp. Bot., December 1, 2007; 58(15-16): 4387 - 4395. [Abstract] [Full Text] [PDF] |