Asymmetry in packing the peptide amide dipole results in
larger positive than negative regions in proteins of all folding motifs. The average side chain potential in 305 proteins is 109 ± 30 mV (2.5 ± 0.7 kcal/mol/e). Because the backbone has zero net
charge, the non-zero potential is unexpected. The larger oxygen at the
negative and smaller proton at the positive end of the amide dipole
yield positive potentials because: 1) at allowed phi and psi angles
residues come off the backbone into the positive end of their own amide
dipole, avoiding the large oxygen; and 2) amide dipoles with their
carbonyl oxygen surface exposed and amine proton buried make the
protein interior more positive. Twice as many amides have their oxygens
exposed than their amine protons. The distribution of acidic and basic
residues shows the importance of the bias toward positive backbone
potentials. Thirty percent of the Asp, Glu, Lys, and Arg are buried.
Sixty percent of buried residues are acids, only 40% bases. The
positive backbone potential stabilizes ionization of 20% of the acids
by >3 pH units (
4.1 kcal/mol). Only 6.5% of the bases are
equivalently stabilized by negative regions. The backbone stabilizes
bound anions such as phosphates and rarely stabilizes bound cations.
 |
INTRODUCTION |
The amide group of the protein backbone is the
most prevalent polar group in any protein, and it plays several well
established roles in determining protein structure and function. Thus,
when a protein folds the backbone NH and C==O groups in the protein interior find hydrogen bonds to replace those made to water in the
unfolded polypeptide (Yang and Honig, 1995a
, b
). The pattern of regular
intra-backbone hydrogen bonds yields the protein secondary structures
that have been the subject of research going back to the early work of
Pauling. Amides in specific motifs have been shown to be important for
the stabilization of buried charges. Interactions of charges with the
backbone have been identified both by using geometric rules that
identify hydrogen bonds (Baker and Hubbard, 1984
; Rashin and Honig,
1984
; Stickle et al., 1992
; McDonald and Thorton, 1994
; Gandini et al.,
1996
) and by calculation of the intra-protein electrostatic potential
(Spassov et al., 1997
). Interaction of charges with the
-helix
dipole (Wada, 1976
; Hol et al., 1978
; Hol, 1985
) have been implicated
in increased protein stability (Nicholson et al., 1988
; Sali et al.,
1988
) and in pKa shifts of acidic and basic
residues (Aqvist et al., 1991
; Sancho et al., 1992
; Sitkoff et al.,
1994
). Amides in loops also make hydrogen bonds to stabilize charges.
The backbone is important in calcium, (Strydnaka and James, 1989
),
phosphate, and sulfate (Hol, 1985
; Quiocho et al., 1987
; Jacobson and
Quiocho, 1988
; Luecke and Quiocho, 1990
; He and Quiocho, 1993
; Yao et
al., 1996
) binding sites, and in ion binding in the potassium channel (Doyle et al., 1998
). Amides also play important roles in enzyme reactions such as in the oxyanion hole of the serine proteases, where
they stabilize the negative charge on the substrate carbonyl in the
transition state (James et al., 1980
).
Cations are stabilized in regions of negative potential and anions in
positive regions. Because the amide group is a dipole, if it is
properly oriented it can interact favorably with either charge.
However, there is growing evidence that the backbone stabilizes anions
more often than cations. For example, there are more bound anions such
as phosphate and acidic amino acids at helix N-termini than cations at
the C-termini (Hol et al., 1981
; Richardson and Richardson, 1988
;
Gandini et al., 1996
). A large positive potential is found at the redox
center in iron-sulfur proteins (Langen et al., 1992
; Swartz et al.,
1996
), at the phosphate binding site in
/
barrel proteins
(Raychaudhuri et al., 1997
), and at a cluster of buried acids in the
bacterial photosynthetic reaction centers (Beroza et al., 1995
;
Lancaster et al., 1996
). Calculations show that charges on acidic side
chains are better stabilized than bases by the backbone dipoles in
aspartate transcarbamylase (Oberoi et al., 1996
). The backbone is found
to produce a generally positive potential near the protein surface
(Spassov et al., 1997
). However, there has been no investigation of
whether there is a general principle that the potential from the
backbone is, on average, positive, or of how the neutral amide dipoles
could produce this result.
While the secondary structure motifs are the most obvious consequence
of proteins having an amide linkage, this paper will show that the
amide group imposes additional, inescapable consequences for protein
structure and function. Most simply, the shape of the amide is
dominated by the oxygen of the carbonyl (C==O) being substantially
larger than the amine HN hydrogen (Fig.
1). One consequence of this is that to
avoid a steric clash the peptide R group is trans to the
C==O, closer to the HN, at favored phi and psi angles. Moreover, the
curvature of a protein's surface favors placing the larger carbonyl
oxygen out toward the solvent, while the smaller HN is more likely to
be packed in the protein interior. Thus, the asymmetry of the amide
group itself imposes an asymmetric packing of the amides within
proteins. Electrostatic interactions are the most long-range in
proteins. Asymmetry in the orientation of a collection of dipoles, even
those that are involved in hydrogen bonds, will generate a significant,
non-zero electrostatic potential. This can influence the disposition
and energy of the charged groups within proteins.

View larger version (132K):
[in this window]
[in a new window]
|
FIGURE 1
Space filling representation of an amide group. The
amine HN (r = 1.0 Å) is substantially smaller than
the carbonyl oxygen (r = 1.6 Å). The first atom of
the two side chains (CB) adjacent to the amide are oriented as they
would be in an -helix.
|
|
This paper will describe the analysis of many protein structures to
show that the neutral backbone dipoles make the electrostatic potential
more positive within proteins of all motifs. It will then be shown how
the structure of the amide dipole itself, negative toward the carbonyl
oxygen and positive toward the amide proton, produces a non-zero
potential in all proteins. Lastly, an analysis of the distribution of
acidic and basic side chains and ionized substrates and cofactors in
many proteins will show a bias toward burying anions rather than
cations, not unexpected if the backbone dipoles make the protein
interior more positive. Each protein represents a balance of many
forces such as the hydrophobic effect favoring non-polar residues
inside a protein and the solvation energy stabilizing charged residues
on the surface. The basic geometry of the amide dipole by producing
more positive potentials within all proteins adds another term to the
forces that influence each protein's folding, structure, and function.
 |
MATERIALS AND METHODS |
Protein structures
Proteins were selected from the Brookhaven data bank (Bernstein
et al., 1977
) to contain examples of many of folds in the SCOP classification system (Murzin et al., 1995
).
SCOP classes are
, all
-helix;
, all
-sheet;
/
, mainly parallel
-sheets (
-alpha-
units);
+
,
mainly antiparallel
-sheets (segregated
and
regions); small,
usually dominated by metal ligand, heme, and/or disulfide bridges;
multi, multi-domain (
and
); membrane, membrane and cell surface
proteins and peptides. SCOP classifies domains
independently, so proteins can belong to several motifs. When domains
in one protein are in different SCOP classes the protein is
designated mixed-motif, a group that includes all SCOP multi-domain proteins.
The following 305 proteins were used. The 141 proteins with resolution
of
1.8 Å are underlined. The 30 structures with resolution
2.6 Å are in italics.
-helix: 1aep, 1ala, 1bbh, 1bgc,
1bgd, 1cce, 1ccr, 1clm, 1cmb, 1cpc, 1cpt, 1csh,
1dcc, 1eco, 1fia, 1gmf, 1hdd, 1hrs, 1huw, 1hyp, 1lis, 1lmb, 1lpe, 1mba, 1mbc,
1mdy, 1oct, 1omd, 1par, 1phc, 1r69, 1rhg, 1rib,
1rop, 1utg, 256b, 2abk, 2asr, 2ccy, 2cep, 2cnd, 2cro, 2cts, 2cyp, 2hhb, 2hmq, 2mhr, 2pal, 2wrp, 2ycc,
351c, 3c2c, 3gly, 3icb, 4bp2, 5cpv, 5cyt.
-sheet: 1aac, 1acx, 1arb, 1avd,
1bbp, 1bcx, 1bgh, 1cau, 1ctm, 1f3g, 1gcs, 1gct,
1gof, 1hbp, 1hcb, 1hlc, 1hmr, 1hne, 1hoe, 1hvj,
1icm, 1ifc, 1igm, 1mdc, 1mjc, 1mup,
1nsc, 1paz, 1plc, 1pmy, 1png, 1ppl,
1pts, 1r1a, 1rbp, 1scs, 1sgt, 1shf, 1shg,
1snc, 1stp, 1ten, 1tie, 1tld, 1tnf,
1ton, 1ttb, 1vmo, 2alp, 2apr, 2ayh, 2aza, 2ca2,
2cab, 2cpl, 2er7, 2fb4, 2ltn, 2mcm, 2mev,
2pab, 2pcy, 2pec, 2plv, 2psg, 2rhe, 2rsp, 2sam,
2sga, 2sil, 2snv, 2sod, 2stv, 3est, 4fgf,
4gcr, 4pep, 4sbv, 6nn9.
/
: 1aba, 1aco, 1ads, 1alk,
1amp, 1bnh, 1cde, 1cus, 1gdh, 1gpb, 1hmy, 1lct,
1nar, 1nba, 1nip, 1ofv, 1omp,
1rpa, 1rve, 1s01, 1sto, 1thg, 1tml,
1tpf, 1trk, 1 ulb, 1wht, 2ak3, 2dkb, 2dri,
2had, 2prk, 2rn2, 2trx, 3chy, 3cla, 3dfr, 3eca, 3hsc,
4fxn, 5p21, 7aat, 8abp.
+
: 1aak, 1ahc, 1alc, 1apa, 1ast, 1aya,
1brn, 1cew, 1ctf, 1dtp, 1fdd, 1fkf,
1frd, 1fus, 1fxd, 1fxi,
1gmp, 1iag, 1igd, 1lba, 1mat, 1mol,
1npk, 1pkp, 1ppn, 1ris, 1rms, 1sha,
1tbp, 1ubq, 1yat, 2acg, 2act, 2bop,
2chs, 2ci2, 2dnj, 2fxb, 2hpr, 2lzm, 2ms2, 2msb, 2pol, 2ssi, 2uce, 3b5c, 3il8, 4tms, 7rsa,
9rnt.
small: 1aap, 1cbn, 1fas,
1isu, 1nxb, 1rdg, 2cdv,
2ovo, 2sn3, 4ins, 4pti,
4rxn, 9wga.
multi-motif: 1ezm, 1isb, 1sry, 2tmn, 3sdp, 1bia,
1chm, 1cse, 1emd, 1gal, 1glv, 1lvl, 1pca,
1pda, 1phh, 1rbl, 2glt, 2npx, 2reb, 2sic,
3cox, 3grs, 4enl, 4gpd, 4mdh,
5rub, 9ldt, 2 cmd, 2pia, 8atc, 1dlh, 1tss, 2aai,
2mha, 1ddt, 1esl, 1dsb, 1glq, 1gne, 1hna,
2gst, 2pgd, 4ts1, 1gia, 1fc2, 1lla, 1prc,
2bpf, 3mdd, 1cdg, 1cdo, 1eft, 1hpl, 2aaa, 8adh,
1gla, 1dlc, 1tnr, 2bbk, 2por, 1rpl,
1gma, 1ppt.
Crystallographic waters, SO4, and
PO4 with >10% of their surface exposed to
solvent were deleted. The surface exposure was determined with the
program SURFV (Sridharan et al., 1992
). Protons were added
to the proteins with a 1.0 Å bond length and standard geometry.
Calculation of the electrostatic free energy terms for acidic and
basic residues
Electrostatic free energy terms were calculated for the ionized
form of the acidic residues Asp and Glu and the bases Arg and Lys.
DelPhi calculations were run for each residue with charges only on the
atoms of this one side chain. All other atoms in the protein had zero
charge. Focusing was used (Gilson et al., 1987
) so that the minimum
resolution for mapping the atoms and surface to the grid for the finite
difference solution of the Poisson equation was 0.83 Å/grid. The
dielectric constant for the protein (
prot) was
4, while that of the surrounding solvent
(
solv) was 80. For each ionized side chain the
same calculation provides the pairwise interactions of the residue with
the backbone and its reaction field energy.
Pairwise interactions between the backbone and ionized side chains
The potential was determined at all atoms in the backbone in a
protein where a single acidic or basic residue has charge. The free
energy of the pairwise interaction between the backbone and side chain
i (
Gbkn) is:
|
(1)
|
where
bjsi is the potential at atom b in the
backbone of the jth residue from charges on the
ith side chain. This pairwise interaction was obtained for
the bn atoms of the backbone that bear partial charge
(qa) (Table
1). The interaction was then summed for
all R backbone amides in the protein.
Reaction field energy
The reaction field energy (also referred to as the self,
solvation, or Born energy) measures the difference in energy of an ion
or dipole when it is transferred between media with different abilities
to reorganize around charges. Electronic polarization and rearrangement
of atomic dipoles both contribute. Using continuum electrostatic
theory, the response of the media is encapsulated in the dielectric
constant. The reaction field energy is calculated here using an
algorithm in DelPhi, which determines the interaction energy between
the charges on the protein atoms and charges induced at the
protein-water dielectric boundary (Nicholls and Honig, 1991
; Sridharan
et al., 1992
).
The penalty for placing a charge at its location in the protein is the
difference between the reaction field energy of the residue in situ and
the reaction field energy of the same residue isolated from the
protein:
|
(2)
|
Grxn in protein and
Grxn in soln are both negative,
favorable terms.
Grxn is always a
positive, unfavorable energy term because the absolute value of
Grxn in protein is always less than
Grxn in soln. The reaction field
energy for side chains in solution were obtained for isolated
coordinates of each side chain in the protein data bank file 1PRC
(Table 2). There is very little variation
between different conformers of any side chain, so one reference value
is used for each type of residue.
Calculation of interactions between the backbone and all side
chains and bound ligands
Average potential in the protein
The potential was calculated by placing partial charges on all
backbone amides. A DelPhi calculation was carried out with a
1293 grid. This provides a grid spacing of >1.0
Å/grid for all but 30 proteins. The potential
(
abkbn) from the backbone at each of the
m non-backbone heavy atoms (a) was averaged to determine
VP. The potential at waters and other
non-protein atoms was not included in the sum.
|
(3a)
|
In a group of N proteins the average of
VP is:
|
(3b)
|
The average potential (VS) from
the backbone at a residue was obtained from:
|
(4a)
|
where there are n non-backbone heavy atoms (a) in the
side chain
In a group of R residues the average of
VS is:
|
(4b)
|
The free energy of interaction of the jth side chain
or ligand with the backbone is:
|
(5)
|
where qa is the charge on atom
a in an appropriate partial charge set. The free energy of interaction
of a side chain or ligand with the backbone
(
Gbkbn) can be calculated with
either Eq. 1 or 5. For Eq. 1 the side chain is charged and the
potential is collected at all the atoms of the backbone. For Eq. 5, the backbone is charged and the potential is collected at the side chain atoms.
Unless otherwise noted, calculations of
VP,
VS, and
Gbkn use
CHARMM partial atomic charges for backbone (Table
1) and side chains (Brooks et al., 1983
);
prot
is 4 and
solv is 80. The atomic radii were for
each atom type H 1.2 Å, C 1.8 Å, N 1.5 Å, O 1.6 Å, S 1.9 Å, P 1.2 Å.
Interaction between side chains and specific amide groups
The interaction of each side chain with each amide was
calculated in 51 proteins. Each DelPhi calculation had partial charges on only one amide group. Thus, R calculations were made for a protein
with R residues. The grid resolution was >0.83 Å/grid for each
protein. Where necessary the focusing technique was used centered on
the amide that carried the partial charges (Gilson et al., 1987
). The
net charge was 0 in each run, resulting from ± 0.9 charge for a
standard amide and ± 0.75 for Pro. Equations 3 and 4 were used to
calculate the average potential from each amide within the protein or
at specific side chains; Eq. 5 provided the free energy of interaction
between specific side chains and individual amides.
Potential at CB from amide(n) and amide(c) as a function of the phi
and psi angle
All non-terminal amino acids in a protein lie between an amide
toward the N-terminal (amide(n)) and one toward the C-terminal (amide(c)) (Fig. 2). Two series of 36 Ala
tripeptide coordinates were constructed. In one set the phi angle was
changed in increments of 10°, in the other the psi angle was varied.
For the series with different phi angles, all atoms toward the
N-terminal were rotated holding the central CA and CB and all atoms
toward the C-terminal rigid. The series with different psi angles were
constructed holding the N-terminal and the central CA and CB fixed and
rotating atoms toward the C-terminal.

View larger version (27K):
[in this window]
[in a new window]
|
FIGURE 2
Each non-terminal side chain lies between 2 amides, one
toward the N-terminal and the other toward the C-terminal.
(A) The amides toward the N-terminal (amide(n)) and
C-terminal (amide(c)) of the side chain of residue i.
(B) One amide is amide(c) for one side chain (i) and is
amide(n) for the next side chain (i + 1) in the
protein.
|
|
The potential at the central CB was obtained using Coulomb's law
assuming a uniform dielectric constant of 4. Calculations with the
tripeptides surrounded by water (
prot = 4;
solv = 80) were calculated with DelPhi. In
this case the positions of all atoms in the tripeptide modify the
dielectric boundary, and so effect the results. The variation of phi
was carried out in tripeptides where psi is
60°, while the psi
rotation was carried out in peptides where phi is 120°.
Comparing the surface exposure of the carbonyl O and amine HN for
each amide
In the standard protein, the N to HN distance is 1.0 Å and the
H radius is 1.2 Å. In contrast the average C to O bond length is 1.23 Å and the O radius is 1.6 Å. This geometry ensures that the O will
have more surface to expose to solvent than the HN does. Protein
coordinates were prepared where the HN to N distance was 1.23 Å and
the HN radius was 1.6 Å. The surface exposure of the O and the
modified HN to a 1.4 Å probe were calculated with the program
SURFV (Sridharan et al., 1992
).
The in situ pKa of acidic and basic residues
The pKa of acids or bases in proteins can
be different from that found in solution because interactions in the
protein shift the relative energy of residue or ligand charged and
neutral state (Churg and Warshel, 1986
; Bashford and Karplus, 1990
;
Gunner and Honig, 1991
; Yang et al., 1993
; Antosiewicz et al., 1994
;
Gunner et al., 1997
). The complete calculation of residue ionization states is beyond the scope of this paper. However, other interactions in the protein will modify the expected effects of
Grxn and
Gbkn. Thus, if the charge state of
all other R residues were fixed, the pKa of
residue i would be shifted from its value in solution (pKsoln,i) in the following way:
|
(6)
|
The terms
Gbjkn,icrg and
Grxn,icrg, the charged residue's
interaction with the backbone and its reaction field energy, are
calculated with Eqs. 1 and 2, respectively, and will be described in
detail here. The interactions of the neutral forms of a residue
(
Gbkn,ineu and
Grxn,ineu) are often small. The final sum
represents the difference in the pairwise interactions of the
j other polar and charged side chains with residue i in its
charged and neutral form. This is the most significant omitted term.
Other terms can arise from intra-protein motions that are coupled to
the ionization of the residue
(
Gother). Within the protein the
charge state of all residues are interdependent (see Bashford and
Karplus, 1990
; Yang et al., 1993
; Antosiewicz et al., 1994
; Alexov and
Gunner, 1997
for a more complete description).
 |
RESULTS |
The potential from the backbone within proteins
Backbone potential within four representative proteins
The degree to which the backbone amides make protein interiors
more positive is shown graphically for four proteins with the basic
folding motifs:
,
,
/
, and
+
. The potential at a representative slice through each protein with only backbone dipoles assigned partial charges is visualized with the program
GRASP (Nicholls et al., 1991
) (Fig.
3). Although the net charge on each protein is zero, the interior is predominately positive. At least a
quarter of the total volume of each protein is at a potential above 120 mV, while <10% is below
120 mV (Table
3).

View larger version (159K):
[in this window]
[in a new window]
|
FIGURE 3
Electrostatic potential at a slice through four
proteins with different folds. Potentials calculated and displayed with
the program GRASP (Nicholls et al., 1991 ). Blue regions are
at positive and red at negative potential; CHARMM charges,
protein = 4; solvent = 80. (A) motif: Met-hemerythrin from sipunculid worm
(Themiste dyscrita) (2HMQ chain A) (Holmes
and Stenkamp, 1991 ). A 104 residue iron-binding protein in a
four-helical up-and-down bundle with a left-handed twist (Motif
descriptions from the SCOP data base (Murzin et al.,
1995 )). (B) motif: human lipid binding protein
(1HMR) (Zanotti et al., 1992 ). A 129 residue 10-stranded
meander -sheet folded upon itself. (C) / motif:
triose phosphate isomerase from Trypanosoma brucie
brucei (1TPF) (Kishan et al., 1994 ); a 247 residue
/ barrel which has 8 alternating and segments forming an
internal, parallel -sheet barrel; and (D) + motif: bovine ribonuclease A (7RSA) (Wlodawer et al.,
1988 ); A 124 residue protein with a long curved -sheet and 3 -helices.
|
|
Average potential from the amide backbone inside all
proteins
The potential from the backbone
(VP) was determined in 305 proteins
chosen to include representatives of many folding motifs (Eq. 3a).
VP determines the potential at
non-polar, polar, and ionizable side chains.
VP is always positive, ranging from 57 to 244 mV (1.3-5.6 kcal/mol/e). The average
VP is 110 ± 30 mV (2.54 ± 0.70 kcal/mol/e) (Eq. 3b, Fig. 4).

View larger version (46K):
[in this window]
[in a new window]
|
FIGURE 4
The number of proteins with different values of the
average electrostatic potential at the side chain heavy atoms
(VP). VP was
calculated with Eq. 3a for 305 proteins. The patterns for different
SCOP protein motifs: , black; , horizontal; + , diagonal; / , cross-hatch; others, white.
|
|
The average potential from the backbone is positive for all protein
motifs. Helical proteins have on average the smallest potentials
(95 ± 23 mV) and
/
proteins the largest (136 ± 36 mV)
(Table 4). There are more small or pure
or
proteins among the least positive proteins, and more
/
or mixed motif proteins among the most positive. However, all folds are
represented in both the most and least positive proteins studied except
for the small proteins.
View this table:
[in this window]
[in a new window]
|
TABLE 4
The average potential at all non-hydrogen, side chain
atoms from the backbone dipoles inside 305 proteins
|
|
Importance of specific parameters used in the calculations
The dielectric constants for protein and solvent were varied to
determine whether the bias toward the backbone potentials being
positive is due to the specific parameters used (Table 4). If the
calculations use a uniform dielectric constant of 4, rather than having
an
solv of 80, the average potential of the
proteins tested is 137 ± 62 mV. Thus the result does not depend
on the high dielectric constant of the solvent. Raising the interior dielectric constant diminishes VP
without changing its sign (data not shown). The charge distribution can
also be varied. For example, moving the 0.1 charge placed on CA in the
CHARMM charge set to the HN (EQ charge in Table
1) also yields a positive average potential (93 ± 34 mV).
It is possible to determine the relative importance of the atoms that
make up the backbone dipoles in determining
VP. Each amide can be viewed as two
smaller dipoles with zero net charge: a unit made of the carbonyl (C
and O) and one of the amine (HN, N, and CA) (Table 1). For each protein
~77% of the average potential is a result of the C---O dipole while
22% results from the HN-N-CA charges (Fig.
5). The same relative importance can be
found in the contribution of each mini-dipole to the dipole moment of
the amide. Thus, an amide with CHARMM charges has
a dipole moment of 4.2 D. The carbonyl mini-dipole moment is 3.2 D,
representing 76% of the total, while it is 1.0 D for the amine.

View larger version (26K):
[in this window]
[in a new window]
|
FIGURE 5
Comparison of the average potential at side chain heavy
atoms (VP) for proteins with different
charges on the backbone. VP was calculated
with Eq. 3a. Charges from Table 1: ( ), amine (HN, N, CA) charges;
( ), carbonyl (C, O) charges. The straight lines are described by:
11.91 + 0.77x (r2 = 0.96) and 11.2 + 0.22x (r2 = 0.71)
|
|
Average potential at different types of side chains
The average potential was determined at each side chain
(VS) (Table
5). Only 2.0% of the residues are
at potentials below
60 mV, while 75.6% are more positive than +60
mV. The average of VS is always
positive for all types of residues, ranging from 228 mV for Ala to 32 mV for Arg. The average side chain potential is most positive for small
groups such as Ala, Cys, and Ser, and decreases as the side chain
becomes larger. This results in the average
VS for all side chains being more
positive than the average VP for all
proteins. The smaller, more positive side chains contribute as
much as a large side chain to the average of
VS, but not
VP.
Potential at small molecules, cofactors, and substrates bound to
proteins
There are many ligands bound to the proteins analyzed here. The
potential from the backbone was investigated at several types of bound
molecules (see Table 6).
The average potential at buried waters is positive, with twice as many
waters at potentials >+60 mV than at <
60 mV. Thus, these neutral
dipoles are likely to be found at positive potential.
Metals are the only bound cations that are present in any abundance in
proteins. Many of the divalent cations cadmium, cobalt, copper,
non-heme iron, manganese, magnesium, ytterbium, and zinc are at
potentials from the backbone >300 mV. Only Ca2+
and Na+ are ever found at potentials from the
backbone more negative than
70 mV. The importance of specialized
backbone motifs for coordinating Ca2+ is well
established (Strydnaka and James, 1989
; McPhalen et al., 1991
). Thus,
the bias toward the backbone being positive inside proteins extends
even toward the binding sites for positive ions. With the exception of
calcium and sodium, the backbone substantially destabilizes cation
binding. These must be bound by protein side chains or anionic ligands.
The positive potential from the backbone at iron sulfur clusters has
been previously described (Langen et al., 1992
; Swartz et al.,
1996
). The very positive potential strongly favors the reduced over the
oxidized form of these redox sites.
Many enzyme substrates such as ATP or GTP are nucleotides, while many
cofactors such as flavins and nicotinamides are derived from
nucleotides. Each has negatively charged phosphate groups. The average
potential at the phosphates is 435 mV, which will substantially
stabilize binding. Small anions such as phosphate or sulfate are also
always bound in regions of positive potential from the backbone.
Structure of the amide group yields the imbalance between positive
and negative regions generated by the protein backbone
Role of the neighboring amides in generating the bias toward
positive potentials in proteins
The potential from each amide at each side chain was determined
for 51 proteins that sample several folds and include the most and
least positive Vp values in each
structural class (Table 7). This group
of proteins is slightly more positive than the 305 proteins, yielding
the small differences among Tables 5-7.
Each non-terminal side chain lies between two neighboring amides, one
toward the N-terminal, the other toward the C-terminal (Fig. 2). All
other amides in the protein are distal to this side chain. Phi and psi
angles define the neighboring amide orientation, secondary and tertiary
structures produce the arrangement of the distal amides. Analysis of
the potential from neighboring and distal amides shows: 1) the
potential from the neighboring amides is always positive; 2) the
standard deviation of this potential increases as the flexibility of
the side chain increases; 3) the potential from the distal amides is
very variable, as seen in the large standard deviation of this value
for each type of residue; 4) on average, the distal amides also raise
the potential at all residues except at the bases Arg and Lys; and 5)
the average potential for Cys from the distal amides is very positive.
This is largely due to the very positive values at the Cys that are
ligands in iron-sulfur clusters (Table 6) which are over-represented in the group of proteins.
The potential at a side chain (VS) is
the sum of the potential from the neighboring and the distal amides
(Fig. 6). The neighboring amides
contribute 122 ± 68 mV to the average. The relative constancy of
this value shows that, independent of protein motif, the potential from
the backbone starts with a bias of ~110 mV within all proteins. Proteins with average potentials less than this have contributions from
each group's distal amides that are on average negative. The average
potential from the distal amides in the different proteins ranges from
40 to 120 meV, extending to higher positive than negative values.

View larger version (28K):
[in this window]
[in a new window]
|
FIGURE 6
Comparison of the contribution of the neighbor and
distal amides to the average potential for 51 proteins. Each residue is
charged in turn in each protein and the potential collected at the two
neighboring side chains and at the distal side chains. Different
protein motifs: , ; , ; + , ; / , ;
others, . The straight lines are described by neighboring amides,
86.91 + 0.18x (r2 = 0.56); and distal
amides, 86.9 + 0.82x (r2 = 0.96)
|
|
Why the potential from the neighboring amides is always
positive
The potential from the neighboring amides at CB in a medium of
uniform dielectric constant is solely determined by the phi angle (for
amide(n)) and the psi angle (amide(c)) (Fig. 2). Under these simplified
conditions it becomes clear why the potential from the neighboring
amides at any residue is almost always positive. The impact of the
surrounding solvent and extended side chains on the potential and
resulting
Gbkn will be described below.
The potential is shown visually for an amide group along with the CBs
for which this is amide(n) and amide(c) (Fig. 2 and 7). The polypeptide chains are arranged
with phi and psi angles found in
-helices or
-sheets. In each
case the CBs toward the N- or the C-terminal are in the region of
positive potential from the amide.

View larger version (87K):
[in this window]
[in a new window]
|
FIGURE 7
Each amide forms the junction between two residues
(Fig. 2 B). One amide is amide(c) for residue (i), with
an orientation between side chain and amide determined by the psi
angle. The same amide is amide(n) for the next side chain (i + 1)
and their orientation is described by the phi angle. GRASP
(Nicholls et al., 1991 ) pictures showing the two CBs
(green) neighboring one amide in (A)
-helix ( = 52, = 53); (B)
-strand ( = 123, = 143). The five atoms assigned
charge are labeled, colored red (negative) or blue (positive), and
given a radius that is proportional to the partial charge. The
isopotential contours at +0.85 kcal/e (blue) and 0.85
kcal/e (red) calculated with (C,
D) peptide = solv = 4; and (E, F)
peptide = 4, solv = 80.
|
|
The potential was determined as a function of the phi and psi angles at
the middle CB in an Ala-tripeptide (Fig.
8). The potential from amide(n) is less
than zero only for phi values between 40° and 180°, a region that
is unfavorable for any residue but Gly because of steric hindrance
between CB (of residue i) and the amide(n) (residue i-1) carbonyl
oxygen (Ramachandran et al., 1974
). Thus, the side chain is
constrained to come off the backbone into the positive rather than the
negative end of amide(n) because the carbonyl oxygen has a van der
Waals radius that is much larger than the HN. The phi angles in
-helices lie close to the maximum value of the potential, while
-sheets rotate the side chain into regions of lower potential from
amide(n).

View larger version (32K):
[in this window]
[in a new window]
|
FIGURE 8
The potential at the middle CB in an Ala tripeptide
from (A) amide(n) as a function of the phi angle and
(B) amide(c) as a function of the psi angle (see Fig. 7
A). The potential was calculated with (bold
line) peptide = solv = 4; (flatter, light line) peptide = 4, solv = 80. The relative occurrence of residues
with different phi (C) and psi (D) angles
in the 305 proteins considered in this study were determined with the
program DSSP (Kabsch and Sander, 1983 ): Solid line,
-helix; heavy dotted line, -sheet; light line, other.
|
|
The potential from amide(c) is always positive, in part because the
carbonyl C is always closer than the O to the CB. The region of maximum
potential is at values for psi that are disallowed. The potential in
helical regions is slightly larger than for
-sheets.
The potential at CB from the neighboring amides is influenced by the
dielectric properties of the surrounding solvent. Thus, the
isopotential contours from an amide group are smaller when the amide is
immersed in solvent (Fig. 7). However, the pattern of the variation of
the potential with phi and psi is independent of solvent (Fig.
9).

View larger version (29K):
[in this window]
[in a new window]
|
FIGURE 9
The dependence of the average potential at the side
chain (VS) on the length of the side chain.
The average potential (VS), ; the
contribution from the neighboring amides, ; the contribution of the
distal amides, . Data from Table 8.
|
|
As the side chains become longer the potential from the neighboring
amides decreases (Fig. 8). A decrease in the positive potential along
individual side chains was noted previously by Spassov (Spassov et al.,
1997
). In addition, longer side chains have more allowable rotomers
with atoms in different positions relative to the amide dipole, which
increases the deviation from the average potential (Table 7).
The amide orientation relative to the protein surface affects the
intra-protein potential
Modified protein structures were prepared where the HN to N bond
in the amide amine was lengthened to be as long as the O to C bond in
the carbonyl and the HN radius was increased to the size of the O. The
surface accessibility of O and HN in these modified structures provides
a simple, rough estimate of whether each amide points its carbonyl or
amine out toward the solvent. With few exceptions, if an amide O is
more surface-exposed than its HN, this amide raises the potential in
the protein (top right quadrant of Fig.
10). If the O is more buried the amide
lowers the potential (bottom left quadrant). The same
pattern is found for
-helical,
-sheet, and random coil regions of
all protein folds.

View larger version (40K):
[in this window]
[in a new window]
|
FIGURE 10
The difference in the exposure of the HN and O vs.
the contribution of that amide to the average potential within the
four-helix bundle 2HMQ, the -barrel 1HMR,
the / barrel 1TPF, and the + protein
7RSA. Residues in -helices ( ), in -sheets ( ),
and in loops ( ). The structures were modified as described in the
Methods section to equalize the length and size of the HN-N and C-O
dipoles. The potential was calculated with protein = 4, solv = 80.
|
|
The total contribution to the potential from amides with HN more
exposed, O more exposed, or with little difference between their
exposure were compared (Table 8). The
residues that have little differential exposure contribute only a small
amount to the average potential within the protein. For each protein
the contribution per amide for those with the O or the HN more exposed are of similar magnitude, but opposite sign. However, there are always
more amides where the O surface exposure exceeds that of the HN than
those with the opposite orientation. Overall 38 ± 6% of the O's
in the 305 proteins studied here have at least 10% of their surface
exposed, while only 17 ± 6% of the HNs are this exposed. The
preponderance of surface-exposed carbonyl oxygens is another reason why
the interior of all proteins is at positive potential. This provides a
mechanism for raising the potential at buried ligands that lack the
interactions with neighboring amides that raise the potential at side
chains.
View this table:
[in this window]
[in a new window]
|
TABLE 8
The contribution of amides to the potential in the protein
depends on the amide orientation relative to the protein surface
|
|
How the positive potential from the backbone contributes to the
free energy of ionized side chains in proteins
The free energy of interaction between side chains and the
backbone
The potential is positive at the non-polar residues such as Val
(average VS is 163 mV), Ile (145 mV),
and Leu (126 mV) (Table 5). Moving from a potential of 0 into a
potential of 163 mV would stabilize a negative charge by
3.75
kcal/mol or destabilize a positive one by an equivalent amount.
However, despite the significant potential (
I)
these neutral, non-polar residues contribute little to the free energy
of side chain interaction with the backbone (
Gbkn), because the net atomic
partial charge (qI) is near zero (Eq. 5). The large positive potential at non-polar residues supports the
picture that forces other than favorable electrostatic interactions between side chain and amide dipoles are responsible for the
predominately positive protein interior. However, the average of
VS at the acidic residues Asp and Glu
is 45 and 24 mV, respectively, more positive than at their polar
analogs Asn and Gln. The bases Arg and Lys do have the least positive
average VS. Thus, electrostatic
interactions between backbone and side chains do contribute
somewhat to the amide orientation that determines the potential.
VS considers all side chain heavy
atoms equally (Eq. 4a). In contrast,
Gbkn considers the partial charge
on each atom and the potential (Eq. 5).
Gbkn is favorable at the basic
residues despite the average side chain potential being positive. Thus, the atoms with positive charge must be in regions that are more negative than the average for the residue as a whole. In contrast, the
average Glu VS is 108 mV while the
average
Gbkn is only
68 meV
(
1.6 kcal/mol). Thus, the potential must be more positive at atoms
that cannot add to the favorable
Gbkn because they have little charge.
Loss of reaction field energy of ionized amino acids in
proteins
The loss of reaction field energy
(
Grxn) (Eq. 2) provides a
quantitative measure of the distribution of buried charges in proteins.
The interactions with the potential created by the backbone will be
most important for buried, charged residues.
Grxn was calculated for the acids
Asp and Glu, and bases Lys and Arg (Figs. 11 and
12; Table
9). Seventy percent have lost <4.1
kcal/mol of the reaction field energy they would have if free in water,
shifting the residue pKa by <3 pH units (Eq. 6).
Thus, as expected, most of these ionizable residues are near the
surface. However, 30% (5501) have
Grxn >4.1 kcal/mol. Half of these
have lost sufficient reaction field energy to shift their
pKa values by 5 pH units (6.8 kcal/mol). A 5 pH
unit shift destabilizes an ionized Asp, moving its
pKa from 4 to 9. The same
Gxn shifts the
pKa of an Arg from 12.5 to 7.5. Burial in the
protein can also be assessed by the exposure of the side chain to the
surface. The fraction of residues that have lost >6.8 kcal/mol
Grxn is comparable to the fraction
of residues that have <10% of the side chain atoms with significant
charge exposed to the solvent (Table 9).

View larger version (60K):
[in this window]
[in a new window]
|
FIGURE 11
The distribution of acidic and basic side chains with
different values of Grxn and
Gbkn in 305 proteins with different
motifs. CHARMM charges were used for side chains and
amides. The net charges in each run were +1 on the bases or 1 on the
acids. protein = 4, solv = 80. Acids: , Asp; , Glu. Bases: , Arg; , Lys.
|
|

View larger version (49K):
[in this window]
[in a new window]
|
FIGURE 12
The relationship between
Gbkn and
Grxn for the acidic and basic amino acids
in 305 proteins. The bold line is for
 Gbkn = Grxn. The dashed line shows the maximum
value for Gxn, when
Grxn = 0 and
Grxn = Grxn in
soln (Table 2). The ±1.5 kcal/mol has been removed.
|
|
Different propensities are found for burying each type of side chain.
There are more buried Asp, similar numbers of buried Arg and Glu, and
fewer buried Lys. Overall there are more buried acids than bases (Fig.
11, Table 9). This disparity becomes more significant as
Grxn increases. For residues where
Grxn is 4.1-6.8 kcal/mol, 56% are
acids. Of the residues where
Grxn
is >6.8 kcal/mol 62% are acids, representing 17% of the acids and
12% of the bases.
Interaction of ionized residues with the backbone
A buried acid or base with a large
Grxn will be neutral at
physiological pH unless specific elements of the protein stabilize the
charge (Eq. 6). Nearby charges or appropriately oriented dipoles can
compensate for the loss of reaction field energy. The free energy of
stabilization of each acidic and basic residue due to the electrostatic
potential from the protein amide dipoles
(
Gbkn) was calculated with Eq. 1
using CHARMM charges for the backbone (Table 1).
Fig. 12 compares
Grxn and
Gbkn for individual amino acids. No
surface-exposed residue (
Grxn ~ 0) has a large
Gbkn. However,
buried groups have a wide range of interactions with the backbone. The
straight line of slope 1 in Fig. 12 shows where 
Gbkn =
Grxn. If there were no other
interactions (e.g., with the other protein side chains) the
pKa of groups along this line would be identical
to that found in solution. There are a small number of residues where
stabilization by the potential from the backbone dipoles is larger than
the destabilization due to removal from the water dipoles (Fig. 12 and
Table 10). In the absence of other
interactions the protein would shift the pKa of
acids to lower and bases to higher pH values. Prior calculations have
shown that hyper-stabilized residues can be functionally important. For
example, in the photosynthetic reaction center a cluster of buried
acids remain significantly ionized because they exist in a region where

Gbkn >
Grxn (Lancaster et al., 1996
).
There are fewer residues with large
Gbkn than large
Grxn (Tables 9 and 10). Only 14%
of the acidic or basic residues have
Gbkn
larger than ±4.1 kcal/mol. The different types of side chains have the
same order of propensities for large values of
Gbkn as for
Grxn (Asp > Glu
Arg > Lys). However, the difference between acids and bases is
far more striking. For example,
Gbkn is
4.1 kcal/mol for 20% of
the acids, while only 6.5% of the bases have interactions above this
threshold. For most residues
Gbkn
is favorable. However, 80% of the strong, favorable interactions with
the backbone are to acids, only 20% to bases. Of the small number of
residues with unfavorable
Gbkn,
93% are bases (Figs. 11, 12). Thus, acids are more likely to be buried
than bases and they are much more likely to be stabilized inside the
protein by the potential from the amide dipoles. These distinctions are as expected if the potential from the protein backbone creates a bias
to favor buried acids and raise the energy of buried bases.
The role of hydrogen bonds in creating favorable interactions
between backbone and side chain
A hydrogen bond between the terminus of an acidic side chain and
the amide HN or a basic side chain and the amide O generally indicates
that the backbone will stabilize the charged residue. The necessity of
hydrogen bonds for generating large values of
Gbkn was investigated (Table
11). Of the 1942 acids stabilized by
>4.1 kcal/mol, 710 make no hydrogen bonds to the backbone. In
contrast, of the 526 bases only 70 make no hydrogen bonds. This result
highlights the bias toward the protein being positive inside. Thus,
negative regions are almost always formed with local, hydrogen bonds
while positive regions can be generated by longer-range interactions.
View this table:
[in this window]
[in a new window]
|
TABLE 11
Acids or bases that are stabilized by the backbone by
more than 4.1 kcal/mole (3 pH unit) without making hydrogen bonds to
the backbone
|
|
 |
DISCUSSION |
The average potential from the neutral amide dipoles
(VP) is found to be positive in every
protein (Table 4, Fig. 4). Larger regions of each protein are at
positive rather than negative potential (Fig. 3) and this potential is
often large (Tables 5 and 6). The numerical value of the potential
depends on the charge distribution used for the amide and the
dielectric constant for the protein. However, the average remains
positive even when these parameters are varied (Table 4). The potential
from the backbone is positive within all proteins for two reasons.
First, the side chains of all residues come off the backbone into the
positive end of both their neighboring amides (Fig. 2 A).
The regions of phi/psi space where side chains are close to the
carbonyl oxygen are disallowed because of van der Waals overlap
(Ramachandran and Sasisekharan, 1968
). The HN proton is much smaller,
so the side chain can come closer. In addition, the orientation of the
amide at the protein surface influences the interior potential. The
larger, more highly charged carbonyl O is more than twice as likely to
be oriented into the solvent then the amine HN. The amides, with
their O's more surface-exposed, raise the interior potential (Fig.
10). The restrictions in phi/psi space influence the interactions
between amides and their neighboring side chains. The distribution of amide orientation at the protein surface raises the potential at distal
side chains and bound ligands.
It is remarkable, given the complexity and uniqueness of individual
proteins, that the neutral backbone yields a potential that is, on
average, significantly positive in every protein. The question is how
this bias affects protein structure and function. Empirical rules
determined from the distribution of residues in protein structures have
established the importance of other forces in proteins. Thus, the
hydrophobic effect is recognized by many, though not all, non-polar
residues being buried. Again, the solvation of charged residues
stabilizes them on the surface where the majority are found (Table 9).
The analysis of the distribution of acidic and basic side chains
reveals that despite the energetic penalty for removing charges from
water, many are buried. However, there are significantly more buried
acids than bases. This is as expected if the positive potential from
the amides affects side chain location. There are 1.7 times as many
acids that have lost 6.8 kcal/mol (5&nbs