| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Biophys J, October 2002, p. 1731-1748, Vol. 83, No. 4
Department of Physics, City College of New York, New York 10031 USA
| |
ABSTRACT |
|---|
|
|
|---|
Protein stability and function relies on residues being
in their appropriate ionization states at physiological pH. In situ residue pKas also provides a sensitive measure of the local
protein environment. Multiconformation continuum electrostatics (MCCE) combines continuum electrostatics and molecular mechanics force fields
in Monte Carlo sampling to simultaneously calculate side chain
ionization and conformation. The response of protein to charges is
incorporated both in the protein dielectric constant (
prot) of four and by explicit conformational changes.
The pKa of 166 residues in 12 proteins was determined. The
root mean square error is 0.83 pH units, and >90% have errors of <1
pH units whereas only 3% have errors >2 pH units. Similar results are
found with crystal and solution structures, showing that the method's
explicit conformational sampling reduces sensitivity to the initial
structure. The outcome also changes little with protein dielectric
constant (
prot 4-20). Multiconformation continuum
electrostatics titrations show coupling of conformational flexibility
and changes in ionization state. Examples are provided where ionizable
side chain position (protein G), Asn orientation (lysozyme), His
tautomer distribution (RNase A), and phosphate ion binding (RNase A and
H) change with pH. Disallowing these motions changes the calculated
pKa.
| |
INTRODUCTION |
|---|
|
|
|---|
Protein stability and function depend on
interactions among charged and dipolar groups. All proteins have
ionized residues, many bind charged substrates, and changes in charge
distribution often occur in the active site with reaction. Ionization
state changes determine the pH dependence of protein function (Mellor et al., 1993
; Dyson et al., 1997
; Alexov et al., 2000
) and stability (Hendsch et al., 1996
; Spassov et al., 1997
; Elcock and McCammon, 1998
). Proton transfers between buried residues generate transmembrane proton gradients (Ferguson-Miller and Babcock, 1996
; Alexov and Gunner,
1999
; Luecke et al., 1999
; Sham et al., 1999
). Surface-charged residue
distribution is critical for protein-protein association (Sheinerman et
al., 2000
). Computational tools that improve estimates of the
energetics of the charge distribution within proteins will thus help us
understand how protein function is derived from structure.
The analysis of protein charges and dipoles starts with water as
a reference solvent (Warshel and Russell, 1984
; Gilson and Honig, 1988
;
Gunner and Alexov, 2000
). Water dipoles, being both polar and
polarizable, reorient when a charge is added. Ions and dipoles have
favorable interactions with water with its high dielectric constant,
which are diminished in protein. In contrast, proteins are highly polar
but not very polarizable molecules. Each backbone amide has a dipole
greater than water, and more than 50% of all side chains are polar or
ionizable. However, proteins have limited, nonuniform flexibility
producing a heterogeneous response to charges.
The equilibrium distribution of side chain ionization states and
position represents the balance of many interactions and so is hard to
calculate. Methods to calculate the free energies of protein charge
distributions are often tested by their ability to reproduce
experimental pKas (Bashford and Karplus, 1990
;
Beroza et al., 1991
; Takahashi et al., 1992
; Yang and Honig, 1993
;
Antosiewicz et al., 1994
, 1996
; Demchuk and Wade, 1996
; Alexov and
Gunner, 1997
; Sham et al., 1997
; Havranek and Harbury, 1999
; Mehler and Guarnieri, 1999
; Nielsen and Vriend, 2001
). These benchmark
calculations are not ideal because they include many values for surface
residues, which have little interaction with protein. Also, many
pKas are not at pHs where the protein is
functional or the structure determined, so calculations must also model
pH-dependent structural perturbations. However, with more than 100 measured pKas, calculations can try to match many
values with one parameter set and methodology. As reliable
computational methods become available, proteins where electrostatic
forces play important functional roles can be analyzed (Alexov and
Gunner, 1999
; Demchuk et al., 2000
; Dillet et al., 2000
).
The Poisson-Boltzmann equation of continuum electrostatics provides a
powerful framework for the analysis of electrostatic interactions in
proteins (Warwicker and Watson, 1982
; Gilson et al., 1985
). These
calculations take the distribution of dielectric constants, fixed
charges, and mobile ions and return the electrostatic potential.
Partial charges are placed at atoms in the protein structure. Water,
with a dielectric constant of 80, is the major source of the reaction
field (solvation) energy of protein dipoles and charges. Water also
screens interatomic interactions in a manner that depends on the
distances between protein atoms and the solvent. The Poisson-Boltzmann
formalism incorporates the large impact of water easily and reasonably
accurately. However, a central problem is how to assign the protein
dielectric constant. The dielectric constant of a dried protein is
measured to be near 4 (Takashima and Schwan, 1965
). Molecular dynamics
simulations show the protein inner core is not very polarizable with an
effective dielectric constant near 4, whereas the solvent exposed
flexible exterior has a value near 30 (Simonson and Brooks, 1996
).
Calculations using one, low value for protein (
4) yield a poor match
to benchmark pKas, but there is significant
improvement as
prot is increased to
20
(Antosiewicz et al., 1994
, 1996
; Demchuk and Wade, 1996
).
The dielectric constant measures the response of a medium to changes in
charge. In protein calculations the dielectric constant is a parameter
that averages many kinds of reorganization so that they need not be
included explicitly (Sham et al., 1997
, 1998
; Simonson, 1998
, 2001
;
Gunner and Alexov, 2000
; Schutz and Warshel, 2001
). For example,
changes in electronic polarization, backbone or side chain position,
residue protonation state, and ion binding can occur as the pH is
changed or a reaction proceeds. Electronic polarization may be fairly
uniform throughout the protein, and the backbone motions may be small
when there is no significant structural change. Thus, their
contribution to the electrostatic energy may be estimated from an
averaged dielectric constant rather than by explicit atomic
calculations. However, much of the protein response is dependent on the
local environment.
Several methods that remain based on continuum electrostatics add a
nonuniform protein response to charges (for reviews, see Schutz and
Warshel, 2001
; Simonson, 2001
). Different dielectric constants have
been assigned to different amino acids (Voges and Karshikoff, 1998
) or
to different regions of the protein (Demchuk and Wade, 1996
; Rocchia et
al., 2001
). Other methods consider explicit protein motions. Early
methods for sampling multiple protein conformations either averaged
interactions between different side chain atomic positions (You and
Bashford, 1995
) or averaged the ionization states calculated with
different structures (Ripoll et al., 1996
; Zhou and Vijayakumar, 1997
).
Hybrid methods simultaneously calculate ionization states of acidic and
basic residues and the conformation states of surface side chains
(Beroza and Case, 1996
), hydroxyls and waters (Alexov and Gunner,
1997
), and polar and ionizable side chains (Alexov and Gunner, 1999
).
Other approaches include: adding explicit protein relaxation into the
protein dipole Langevin dipole methodology with a linear response
approximation (Sham et al., 1997
, 1998
); the use of a nonuniform
response parameterized on fragment hydrophobic constants, which
implicitly draws on the relationship between a group's polarity and
its ability to partition into water (Mehler and Guarnieri, 1999
); as
well as an iterative mobile cluster approach for calculation of
multiple-site ligand binding to flexible macromolecules (Spassov and
Bashford, 1999
).
The multiconformation continuum electrostatic (MCCE) method allows
multiple positions of hydroxyl and water protons, side chain rotamers,
and ligands in the calculation of the pH dependence of the ionization
equilibria of titratable groups (Alexov and Gunner, 1997
, 1999
). An
analysis of the pH titrations of 166 residues in 12 proteins is
reported here. The benchmark root mean square errors of these
titrations are comparable with those found with other approaches
(Antosiewicz et al., 1994
, 1996
; Mehler and Guarnieri, 1999
). Because
improvement in pKa calculations depends on the protein changing side chain position with pH, MCCE highlights motions
that may be important for maintaining protein structure and function.
| |
METHODS |
|---|
|
|
|---|
The MCCE method
MCCE calculates the equilibrium conformation and ionization
states of protein side chains, buried waters, ions, and ligands at a
defined pH. Preselected choices for atomic positions and ionization
states for side chains and ligands are used. Each specified side chain
or ligand position and charge is called a conformer. Look-up tables are
calculated of electrostatic and nonelectrostatic conformer self- and
pair-wise interactions. Protein microstates have one conformer for each
side chain, and ligand. Monte Carlo sampling establishes each
conformer's probability in the Boltzmann distribution at a given pH or
Eh (Alexov and Gunner, 1997
, 1999
). The method can compare mutated with wild-type proteins (Alexov et al.,
2000
) or protein with reactant or product bound (Alexov and Gunner,
1999
). The MCCE procedure is divided into three stages: 1)
selection of conformers (steps 1-3 below); 2) calculation of energy
look up tables (steps 4-5); 3) Monte Carlo sampling and calculation of
pKas (steps 6-7). Early descriptions of the
method can be found in Alexov and Gunner (1997
, 1999
).
Step 1: generate protein data file, striped of surface waters, complete with all protons except those on hydroxyls or waters
All protons in the original protein coordinate file are stripped off. A translation table brings residue and atom names into compliance with the conventions of the MCCE package. Residues with missing atoms are reported. Every independently ionizable group must have a unique residue name and number so backbone N- and C- terminal atoms are given a residue designator that differs from their side chain. The chain terminus conformer interacts with the terminal side chain. The program Proteus (Samir Lipovaca City College of New York) places protons with no degrees of freedom. Protons with no partial charges, such as methylene protons, are added in their torsion minima. Only buried waters are treated in atomic detail. Water accessibility is determined with the program SURFV (Sridharan et al., 1992Step 2: identify residues with strong electrostatic pair-wise interactions
One DelPhi (Nicholls and Honig, 1991
pK units (2.9 kcal/mol) will be
provided with additional conformers from the heavy atom rotamer library
(Dunbrack and Karplus, 1994Step 3: generate composite protein structure
Conformers must be generated that could be at low energy over the entire range of the calculation before knowing the position or ionization state of other groups. However, if conformers that make good hydrogen bonds to ionized sites are unavailable, the calculated ionization free energy will be wrong. Inside the protein, extra, bad conformers increase the size of the calculation. However, because they are not selected in Monte Carlo sampling, they do not affect the outcome. On the protein surface, extra conformers change the protein-solvent boundary, introducing errors into the electrostatic energy terms. Therefore, conformer preselection tries to generate as many good choices as possible without including positions that will never be found in low energy microstates. Side chains identified as making strong pair-wise interactions in step 2 are replicated on the backbone in rotamer libraries positions (Dunbrack and Karplus, 1994
CG bond (see Fig. 6B). Each of these has two neutral forms in
which ND1 or NE2 is unprotonated and one ionized form. Each Asn and Gln
has two conformers with the terminal O and N interchanged representing
180° rotation of O
C
N. Each Asp, Glu, and Tyr gets one ionized
and two neutral conformers with the proton in the torsion minimum. Each
Ser and Thr has three conformers with a proton in each torsion minima.
Hydroxyl-containing residues (Ser, Thr, neutral Asp, Glu, and Tyr) have
extra conformers that can hydrogen bond with all nearby acceptors.
Water gets conformers to make and accept available hydrogen bonds plus
a conformer with no interaction with the protein, representing water in
bulk solvent.
Step 4: electrostatic self- (reaction field) and pair-wise interactions
As described in Alexov and Gunner (1997)
2 and +2 for calcium and
magnesium. The potential (
) is collected at atoms of all other
conformers. The pair-wise electrostatic interaction energy between the
designated conformer (j), which is the source of the
potential and the distal conformer (k) is
|
(1) |
j
k' is the potential at
atom k' in conformer k from partial charges on
conformer j. The partial charge, qk', that would be on k' in
the ionization state of conformer k. In the same DelPhi run,
the interaction of the conformer j with polar backbone and
side chains without conformers is collected in a single interaction
energy (
Gpol,j).
One approximation is that all conformers are combined in the
protein structure adding to the dielectric boundary. Now only N DelPhi calculations are needed. In contrast,
N2 calculations would be needed if the
right conformers for each pair-wise interaction were used. However,
even this would not provide the correct position for all other
conformers in a given microstate. The small proteins studied here are
the worst case for errors caused by extra surface conformers. In
contrast, buried sites in large proteins will be essentially
undisturbed by moderate changes in protein surface.
Several steps reduce the errors in dielectric boundary introduced by
extra conformers. For the pair-wise interactions, only the active
conformer of the residue with partial charges is included (Fig.
1 A). However, other residues
must have all conformers present so all pair-wise interactions can be
determined in a single calculation. The theoretically symmetrical
G
G
|
pK unit.
However, these residues are not found to have larger errors in
calculated pKa.
Step 5: self- (torsion) and pair-wise (Lennard-Jones) nonelectrostatic energies for all conformers
The proton torsion energy is nonzero only for protons placed out of a torsion minimum to make hydrogen bonds. Neutral Tyr has two energy minima in the plane of the Tyr ring with a 2.59
pK unit
(3.5 kcal/mol) barrier between them, so E(
) =
3.5/2 cos(2
) kcal/mol in which
is the torsion angle. Ser and
Thr have three degenerate low energy positions with
E(
) =
1.3/2 cos 3(
60°)
kcal/mol. The energy of a neutral Asp and Glu hydroxyl proton is the
sum of a torsion potential with minima at 0° and 180° and a 0.96
pK unit barrier and a
2.0
pK unit electrostatic attraction
between hydroxyl proton and nonbonded carboxylate oxygen when both
are in plane so E(
) =
((2.7/2) cos(
) + (1.3/2) cos(2
))kcal/mol. This results in a
2.00
pK unit preference for the syn
(0°) over anti (180°) conformation,
comparable with the value found by solid-state nuclear magnetic
resonance (NMR) measurements (Gu et al., 1996Step 6: Monte Carlo sampling to determine the Boltzmann averaged conformer occupancy
A given protein microstate is a combination of one conformer for each residue, cofactor, and water. The energy of microstate n (
Gn) is
|
|
|
(2) |
pK units (0.59 kcal/mol), M is the number of conformers,
n(i) is 1 for conformers that are
present in the state and 0 for all others.
(i)
is 1 for bases,
1 for acids, and 0 for neutral conformers (polar
groups, waters, neutral acids, and bases).
pKsol,i is the solution pKa
of ionizable group i,

Grxn,i is the difference between
the reaction field energy for this residue type in solution (SM XII)
and that found in the protein (Nicholls and Honig, 1991
Gpol,i and
Gnonel,i are the electrostatic and
Lennard Jones interactions between conformer i and backbone
and side chains with no conformational degrees of freedom;
G
G
pK unit with the first. The new
microstate with two changes is then subjected to Metropolis sampling
(Beroza et al., 1991Step 7: calculate residue pKa and n values from conformer occupancies as a function of pH
Monte Carlo sampling is performed from pH 0 to 14 in steps of 1 pH unit and the pKa of each titrating group is obtained from the best nonlinear, least squares fit to
|
(3) |
(i) is the charge,
1 for an acidic or 1 for a basic residues.
Timing of the MCCE calculations
The analysis of Bovine pancreatic trypsin inhibitor (BPTI) (58 residues and 173 conformations) takes 3 h, whereas RNase H (155 residues and 608 conformers) takes 8 h on a single processor of an R10,000, 180 MHz SGI Origin 200. Most the time is spent in the DelPhi calculations. Monte Carlo sampling to determine all conformer occupancies at 15 pHs takes 20 min for BPTI and 1.5 h for RNase H.
Single conformer continuum electrostatics (SCCE) calculations
SCCE and MCCE calculations use the same protein structures with
the same parameters for atomic charge and radius. In SCCE, waters are
deleted and hydroxyl protons are placed in the most occupied MCCE
position at pH 7. The most populated form of the neutral acidic residue
at low pH is the only available neutral conformer.
prot is 4.
Atomic coordinates and experimental pKa values
MCCE was used to calculate pKas in 12 proteins (Table 1). The 166 measured values are from 1H -13C heteronuclear resonance experiments unless otherwise specified. The experimental and calculated pKa for each residue are reported in the supplementary material (SM I-X).
|
Barnase (SM-I) uses crystal structures 1A2P (Martin et al., 1999
), 1B20
(Buckle et al., 1993
), and the 20 NMR model structures (1BNR) (Bycroft
et al., 1991
). Experimental values were measured at low ionic strength
(Oliveberg et al., 1995
). The pKa of His-18 was
obtained from fluorescence measurements (Loewenthal et al., 1993
).
BPTI (SM-II) uses crystal structure (4PTI) (Marquart et al., 1983
) and
20 NMR structures (1PIT) (Berndt et al., 1992
).
pKa from NMR experiments performed at 41°C were
corrected to 25°C using the method described in March et al. (1982)
.
Intestinal bovine calcium-binding protein (calbindin, CabD) (SM-III)
consists of crystal structure in presence of Ca2+
(3ICB) (Szebenyi and Moffat, 1986
), 24 NMR structures for the holo-protein (1CDN) (Akke et al., 1995
), and 33 NMR structures for the
apoprotein (1CLB) (Skelton et al., 1995
). pKas of
Lys with and without calcium is from Kesvatera et al. (1996)
. The acidic pKas in the apoenzyme are from Kesvatera
et al. (1999)
.
Rat T-lymphocyte adhesion glycoprotein (CD2) (SM-IV) uses crystal
structure 1HNG (Jones et al., 1992
). pKas of 14 acidic residues determined at low (0.1 mM) and high (1.2 mM) salt
concentration were averaged (Chen et al., 2000
).
Hen egg-white lysozyme (HEWL) (SM-V) uses triclinic crystal structures
(2LZT) (Ramanadham et al., 1981
), 1B0D (Vaney et al., 1996
), tetragonal
crystal structure 1HEL (Wilson et al., 1992
), 50 NMR structures (1E8L)
(Schwalbe et al., 2001
), and the averaged NMR structure (1HWA) (Smith
et al., 1993
). pKas of basic residues are from
Kuramitsu and Hamaguchi (1980)
, and values for acidic residues at 100 mM ionic strength are from Bartik et al. (1994)
.
Turkey ovomucoid inhibitor domain 3 (OMTKY3) (SM-VI) consists of OMTKY3
complexed with human leukocyte elastase (1PPF) (Bode et al., 1986
),
Streptomyces griseus proteinase B (3SGB) (Read et al.,
1983
), or
-chymotrypsin (1CHO) (Fujinaga et al., 1987
). The
proteases are removed. There are 50 NMR structures (1OMU) (Hoogstraten
et al., 1995
). Acidic residue pKas are measured
at low (10 mM) and high ionic strength (1 M) (Schaller and Roberston, 1995
). Basic residue pKas are measured at 15 mM
and 500 mM ionic strengths (Forsyth et al., 1998
).
B1-immunoglobulin G-binding domain of protein G (ProtG) (SM-VII) uses
crystal structure (1PGA) (Gallagher et al., 1994
), 60 NMR structures
(1GB1) (Gronenborn et al., 1991
), and minimized NMR structure (2GB1)
(Gronenborn et al., 1991
). pKas at 100 mM ionic
strength are from Khare et al. (1997)
.
Ribonuclease A (RNase A) (SM-VIII) uses the crystal structure with
sulfate (3RN3) (Howlin et al., 1989
), ion free protein (7RSA) (Wlodawer
et al., 1988
), and 32 NMR structures (2AAS) (Santoro et al., 1993
).
pKas at 200 mM NaCl (Rico et al., 1991
; #4211). pKas of His-12, -48, -105, and -119 in the
absence of phosphate are an average of values from Ruterjans and Witzel
(1969)
, Matthews and Westmoreland (1973)
, Walters and Allerhand (1980)
,
and Quirk and Raines (1999)
; His-112 and His-119
pKas determined at 20 mM and 100 mM phosphate
(Meadows et al., 1969
; Cohen et al., 1973
). Sulfate is treated as phosphate.
Ribonuclease Hi (RNase H) (SM-IX) uses crystal structures 2RN2
(Katayanagi et al., 1992
) and 1RNH (Yang et al., 1990
) and Mg2+ containing form (1RDD) (Katayanagi et al.,
1993
) and 8 NMR derived structures (1RCH) (Yamazaki et al., 1997
).
pKas are from Oda et al. (1993
, 1994
).
Ribonuclease T1 (RNase T) (SM-X) uses the crystal structure with
vanadate (3RNT) (Kostrewa et al., 1989
). Vanadate was treated as
phosphate. Measured pKas are from Inagaki et al.
(1981)
and McNutt et al. (1990)
.
Human Immunodeficiency Virus Type-1 Protease (HIV-1) uses the crystal
structure (1HIV) (Thanki et al., 1992
) with dihydroethylene-containing inhibitor, which is removed. pKas are of the
aspartyl dyad from inhibitor free enzyme kinetic studies (Hyland et
al., 1991
; Ido et al., 1991
).
For residues in Barnase, OMTKY3, BPTI, Calbindin, and RNaseHi, there is
more than one pKa reported. The value used is
either an average or the pKa of the site closest
to the atom losing the proton. Overall, the uncertainties are small (
0.3 pK units). The n values (Eq. 3) were estimated from the
published titration curves.
Even in these well-characterized proteins, only 166 of the 374 ionizable residues (Asp, Glu, His, Tyr, Lys, and Arg) have measured pKas (Table 1). There are no Arg pKas reported. Residues titrating in the same pH region influence each other. Thus, errors in the titration of Arg may influence Lys or Tyr. Likewise, errors in titration of missing Asp or Glu can impact the titration of the reported residues.
The following have incomplete titration curves that miss the plateau at low or high pH: Lys-34, Asp-7, and Asp-27 in OMTKY3, Lys-25 in CabD, Tyr-3, Tyr-33, Tyr-45, and Lys-13 in ProtG, Asp-93 and Glu-73 in Barnase, Tyr-23 in BPTI, Asp-48 and Asp-66 in HEWL, and His-114 in RNase H. Generally titration curves justify the reported values. There are eight residues where only a limiting value is available. This is taken as the true pKa, which may overstate the error of the calculations (see supplementary material).
| |
RESULTS AND DISCUSSION |
|---|
|
|
|---|
Comparison of MCCE calculated and measured pKas
The first pKa calculations using detailed
protein structures and energies derived from continuum electrostatics
held the protein rigid, allowed only changes in residue ionization
states, and used a low protein dielectric constant
(
prot = 4) (Bashford and Karplus, 1990
; Lim et
al., 1991
; Yang et al., 1993
; Honig and Nicholls, 1995
). SCCE does not
match experimentally determined pKas very well
(Antosiewicz et al., 1994
; Demchuk and Wade, 1996
). However, SCCE
provides good benchmarks for comparing different methods. SCCE root
mean square (RMS) errors at
prot of 4 (Table 2) are similar to those found previously.
MCCE and SCCE calculations with
prot = 4 are
compared in Fig. 2 and Table 2.
Experimental pKas for each residue and values
calculated with different Protein Data Bank files and
prot values are in the supplementary material (SM-I-X). For the 166 residues, the average absolute error decreases from 0.46 (SCCE) to 0.11 pH units (MCCE). The RMS error is reduced from
2.29 to 0.81 pH units. Less than 10% of the calculated
pKas differ from experiment by more than 1.5 pH
units. With SCCE 40% of the pKas are in error by
this amount and 7% have errors greater than 4 pH units (Table 2).
|
|
Eighty-one residues are on the surface with no special interactions.
However, 35 experimental pKas are shifted by
>1.0
pK unit from isolated amino acids (Table
3). Thirty-six residues are buried and
have lost >3
pK units (4.08 kcal/mol) reaction field (solvation)
energy. Fifty-three have ion pair interactions of >3
pK units.
Thus, 85 residues are in one or more perturbed categories with many in
more than one class. Surface, noninteracting residues are easiest to
model. Their SCCE RMS error is only 1.57, and the MCCE error is 0.67 pH
units. Of the unperturbed residues His have the largest RMS error
because of His-124 in RNAse H (see below). Perturbed residues have
larger errors with SCCE (RMS error 2.58 pH units). MCCE provides
significant improvement (RMS error 1.09), but these residues pose the
greatest challenge for calculation (Table 3). The largest errors come
from residues in strong interactions (RMS error 1.17) as do some of the
greatest improvements. One difficulty is that many ion pairs involve
Arg for which there are no measured pKas.
|
pKas calculated with different structures of the same protein
Calculations should provide values that are the same for all nominally identical structures of the same protein. The average standard deviation with SCCE of the 32 residues in lysozyme structures (2LZT and 1B0D) is 1.4 pH units and only 0.5 pH units with MCCE (SM-V). Thus, MCCE conformational flexibility greatly diminishes the dependence on the initial structure. In addition, the RMS error for the 19 residues with known pKas is cut in half.
The pKas derived from crystal and solution
structures were compared for 126 residues in eight proteins (Table
4). MCCE uses a rigid backbone and
nonpolar side chains. The sampling of conformational space is increased
in an ensemble of NMR structures, where each has somewhat different
backbone and side chain positions. Thus, the average RMS variation of
residue pKas is 1.5 pH units when different NMR
coordinate files are compared. However, the RMS error of the averaged
pKas is 0.83 pH units, smaller than for the
crystal structures (0.94). Similar improvement was found in single
conformer studies with
prot of 20 (Antosiewicz
et al., 1996
). Calculations with single, deposited average NMR
structures have slightly larger errors (RMS error 1.02 pH units).
However, the improvement with analysis of ensembles of NMR structures
comes at the expense of more calculations. Here, 271 NMR structures were analyzed for the comparison of eight proteins.
|
Comparison of experimental and calculated n values
The Hill coefficient, n, in Eq. 3 measures how much residue titrations influence each other. Neighboring residues titrating in the same pH range have smaller n values. Experimental n values were obtained for 94 residues in CabD, Barnase, OMTKY3, CD2d1, and RNaseH, either from the published values or from the shape of published titration curves. Even in these small proteins, half of the residues have n values smaller than 0.85, whereas only two have values significantly larger than 1. The MCCE derived values (Eq. 3) are well correlated with those determined experimentally (Table 5).
|
The correspondence between calculated titration curves and MCCE site
occupancies become worse as n decreases. This is because Eq. 3 provides only an approximation for the behavior of individual residues, missing the complex pH dependence of clusters of interacting residues (Onufriev et al., 2001
).
Calculation of water and ion binding
Solvent exposed waters are removed, whereas buried waters are
treated in the same detail as side chains with conformations allowing
rotation about the oxygen. Each water has an extra conformer in bulk
solvent with no interactions with the protein. The reaction field
energy of free, solvated water controls the occupancy of interior
sites. The water/hexane partition coefficient suggests a
3.56
pK
units (
4.84 kcal/mol) reaction field energy for an isolated water in
bulk water (Wolfenden and Radzicka, 1994
). For water to be bound, the
remaining reaction field energy plus added interactions with the
protein must be more favorable than this. There are 38 buried,
crystallographic waters (Table 1). With a solution reaction field
energy of
3.56
pK units, the average water site occupancy is only
33% (pH 7). The fraction increases to 77% as the penalty for removing
the water from bulk was decreased to 0.87
pK units. Similar results
were found in earlier MCCE calculations (Alexov and Gunner, 1999
). In
the group of proteins studied here only Asp-66 in HEWL has a
pKa that depends on bound water (see below). The
indifference to water in this data set can be seen in the similarity of
the pKas derived from crystal structures and
those using NMR structures where no waters are included. However, in
other systems, bound water can have a large impact on the stability of
buried charges (Gibas and Subramaniam, 1996
; Garcia-Moreno et al.,
1997
; Alexov and Gunner, 1999
).
Analysis of salt-bridges and corrections for strong electrostatic interactions
It is often difficult to obtain the correct
pKas for residues in ion pairs (Sindelar et al.,
1998
). Continuum electrostatics calculations often over-stabilize the
ionization of charges, lowering acid pKas and
raising those of bases. MCCE with no correction yields
pKas for the 98 acids
0.59 pH units too low and
for the 43 bases 0.54 pH units too high. An empirical function (SOFT) (Eqs. 4a and 4b, Fig. 3) was introduced
to weaken strong pair-wise electrostatic interactions with other
conformers (
G
Gpol,
i).
|
(4a) |
|
(4b) |
0.16 and 0.07 pH units, respectively. This function has
little effect on the pKa of most groups, but for
20 residues the error is decreased by >1.5
pK units (Fig.
4).
|
|
|
The SOFT function can be viewed as increasing the effective dielectric
constant but only for relatively close neighbors (Fig. 3). The largest
change halves the interaction energy, equivalent to an
prot of 8. There are several reasons that this
may be necessary. Interaction energies are very sensitive to small
changes in position when groups are close together. If structures
report the closest allowable separation for salt-bridges as the
time-averaged position then the interactions will be too large. Small
excursions, weakening electrostatic interactions are not taken into
account. In addition, MCCE uses Lennard-Jones parameters (SM-XI) that
yield good values for hydrogen bonds. These may underestimate the cost
of bringing ion pairs together, overestimating the electrostatic
interactions. Lastly, the available conformers may not be optimal for
stabilizing the charge-dipole state of the ion pair at low and high pH.
Thus, the SOFT function represents a pragmatic solution to the problem of obtaining good values for ion pairs. However, future implementations of MCCE will aim to use enhanced local conformer flexibility to render
this function unnecessary.
MCCE calculations with different protein dielectric constants
Residue pKas were calculated at
prot of 4, 8, and 20 (Tables 2 and 3). With
SCCE, there is marked improvement as
prot is
increased (Antosiewicz et al., 1996
; Demchuk and Wade, 1996
). However,
MCCE calculations are only moderately sensitive to
prot. The RMS error at
prot = 8 (0.69) is slightly smaller than with
prot = 4 (0.84) or
prot = 20 (0.70). Here, site titration is stabilized by either explicit conformation change at small
prot or by the averaged dielectric response as
prot increases. Fewer conformational changes
are seen as
prot is increased.
With larger
prot, buried charges have a
smaller 
Grxn and smaller
pair-wise interactions. Because most pair-wise interactions are
favorable, changes in these two terms tend to offset each other.
However, explicit conformational flexibility can stabilize a charge
more than
prot of 20. For example, the MCCE
pKa of Asp-7 in OMTKY3 increases from 2.8, the
correct value, to 5.0 as
prot increases from 4 to 20. In other cases the ionized form is too stable with
prot of 20 indicating the high dielectric
constant either overestimates the local protein flexibility or weakens unfavorable pair-wise interactions.
Conformational flexibility in the MCCE calculations
In the proteins used here, 55% of the residues have multiple
conformers either because they are polar (23%) or ionizable (32%). Overall 62% of the available conformers have some occupancy and 48%
are occupied by more than 10% during the titration. Focusing on
changes in polar residues (Asn, Gln, Ser, Thr, and waters), 17% do not
change conformation between pH 0 and 14, 65% show small changes of 1%
to 20%, whereas only 4% change occupancy by >50% (Fig.
5). Most motions are hydroxyl
reorientation or rotation of Asn or Gln, which would not be seen by
x-ray crystallography. However, these rotations rearrange hydrogen bond
networks and so can have significant impact on calculated
pKas (Hooft et al., 1996
; Nielsen and Vriend,
2001
). In addition, several residues take heavy atom conformations
different from that found in the protein data file at some point in the
pH titration, and these have significant impact on calculated
pKas (see below).
|
MCCE analysis of individual proteins
The benchmark pKa calculations show the MCCE
method yields as good fit to data as other current methods (Antosiewicz
et al., 1996
; Mehler and Guarnieri, 1999
) (see below). The unique
advantage of this method is that it follows explicit movement of side
chains and waters during a titration and so highlights important
conformational changes (Alexov and Gunner, 1999
).
Lysozyme
Lysozyme is a glycosyl transfer enzyme (Vocadlo et al., 2001
Grxn 3
pK units. Many
pKa calculations have been carried out on this
protein (see below). The MCCE RMS error for 1BOD is 0.63 pH units.
Three residues have errors of
1.1 pH units, whereas all other errors
are smaller (SM-V).
Polar side chain motions
The active site, Glu-35 has a pKa of 6.2, Asp-52 of 3.7, and Asp-66 of 0.9. The interaction between 35 and 52 is only
1
pK unit. The high Glu pKa has
been hard to calculate. With SCCE the order of ionization of 35 and 52 are interchanged (Table 7). However, MCCE
calculations with several x-ray and NMR structures reproduce the
pKas. One determinant is Asn-46. At low pH,
conformers with OD1 and ND2 near Asp-52 are both occupied. At high pH,
the Asn always has ND2 closer to the ionized Asp stabilizing it by
2
pK units. Monte Carlo sampling with the Asn fixed in the conformer found in the crystal structure (OD1 closer to Asp-52) interchange the
order of titration of Glu-35 and Asp-52 (Table 7).
|
Small motions yield significant changes
When Asn-46 is fixed the pKa of Asp-66 decreases by 1.4 pH units. The difference in interaction with ionized Asp-52 and Glu-66 of
0.5
pK unit is too small to explain the
change. However, reorientation of Thr-51, Ser-60, Thr-69, two waters,
and Arg-68+ all stabilize
Asp66
more when Asn-46 is fixed. The Asp-66
pKa also decreases as the water solution reaction
field energy is diminished, increasing the occupancy of stabilizing,
buried waters.
CD2d1
Rat CD2d1, in the immunoglobulin superfamily, is the N-terminal domain of the T-cell antigen CD2 (Driscoll et al., 1991Rotamer changes on the protein surface
Asp-28, Glu-29, and Glu-41 are in a cluster on the binding surface (Fig. 6 A, Table 8). Glu-41, with a pKa of 6.5, helps bind target proteins (van der Merwe et al., 1995
occupy
rotamers close to Arg-31+ and
Lys-43+ further from
Asp-28
. As Glu-41 is ionized,
Glu-29
reorients. Fixing Glu-29 in its original
rotamer lowers its pKa by 0.7 pH units, whereas
that of Glu-41 is raised by 0.4 pH units. This cluster is modeled less
well at
prot = 20.
|
|
Calculation of pKas in mutated proteins
Glu-29 and Glu-41 have been mutated to Gln and pKas measured for nearby residues (Chen et al., 2000Prot G
Streptococcal protein G, an immunoglobin binding protein with small (55-75 amino acids) repeating domains (B1, B2, and B3), binds immunoglobin-G constant regions (Tashiro and Montelione, 1995
prot
of 20 (Khare et al., 1997
Grxn > 3). Ion pairs include
Glu-27 with both Lys-28 (3.0 Å) and Lys-31 (3.0 Å), Glu-15 with Lys-4 (3.4 Å), Asp-47 with Lys-50 (3.0 Å), and Glu-19 with the N terminus (3.4 Å). Each pair has >3
pK units pair-wise interaction with each other.
Accurate pKas of ion pairs can rely on side chain motions
Protein G has five residues in salt bridges. SCCE at
prot = 4 for the five residues with known
pKas have an RMS error of 3.1 pH units, whereas
it is 0.75 with MCCE. In the crystal structure (1PGA) Glu-27 is close
to Lys-28 and Lys-31 so the SCCE pKa of Glu-27 is
too low and that of Lys-28 too high. In the average NMR structure
(1GB1) the Glu is further than 8 Å from either Lys. MCCE provides good
pKas for both Lys-28 and Glu-27 starting with the
1PGA structure because a new Glu conformer is selected that is closer
to that found by NMR. Changes in conformation reducing ion-pair
stabilization had been suggested in previous SCCE based calculations at
prot = 20, which could not obtain the correct pKa for this Glu (Khare et al., 1997OMTKY3
The avian ovomucoids are glycoproteins composed of three tandem, homologous structural domains, each acting as a serine protease inhibitor (Laskowski et al., 1987
prot = 20 (Forsyth et al., 1998Problems calculating pKas of residues in clusters
Lys-13, Lys-34, and Tyr-11 form a triad titrating near the same pH, a motif that is often causes problems. In 1PPF the pKa of Lys-13 and Tyr-11 are both 1.6 pH units too high, whereas Lys-34 is 2.6 pH units too low. The ionized Lys-34 is destabilized by loss of reaction field energy and unfavorable interactions with backbone. A favorable interaction of
1.2
pK
units between Lys-130 and
Tyr-11
helps stabilize the anionic Tyr. In 3SGB
the two Lys pKas are close to experiment, whereas
Tyr-11 is >4 pH units too high. The 
Grxn for Lys-34 and its interaction
with backbone are reduced by 3 pH units shifting its
pKa up. This Lys takes different conformations when it is neutral and ionized. However, the occupied conformer of
Lys-130 has little interaction with the
Tyr-11
, which now remains neutral even at pH 14.
RNaseA
Bovine pancreatic RNaseA is a ribonuclease with a well-defined binding cleft containing His-12 and His-119 and Lys-7, Lys-41, and Lys-66. Of the 36 ionizable residues 16 have known pKas (Rico et al., 1991
prot = 20 (Antosiewicz et al.,
1996Selection of His tautomers
His-12 and His-119 are functionally important residues. There are two positions for His-119 in 3RN3. Previous studies have shown the calculated pKa depends on the choice of rotomer and neutral His tautomer, which must be preassigned in SCCE based methods (Antosiewicz et al., 1996Ion binding
His-12 and His-119 bind phosphate increasing the pKa of each by 1 to 1.2 pH units (Table 8) (Cohen et al., 1973
45.6
pK units (
62 kcal/mol) of reaction
field energy. When bound the phosphate looses 9.5
pK units of
reaction field energy (3RN3), which must be replaced by favorable
interactions, primarily with ionized His-12 and His-119, for the
protein bound conformer to be occupied. Although it is possible to
dynamically calculate the phosphate ionization state within MCCE, this
was not done here. Rather, phosphate is always doubly ionized. Monte
Carlo sampling was carried out under three conditions using the energy look-up tables for 3RN3. In one, the phosphate conformer bound to the
protein is forced to be occupied, in another the unbound conformer is
fixed on, lastly both conformers are allowed and so the ion remains in
equilibrium with the protein. Comparing phosphate bound and free, the
pKa of His-12 increases by 2.8 and His-119 by 1.0 pH unit (Table 8). The shift is intermediate if phosphate remains in
equilibrium with the protein. Here the phosphate dissociates, as His-12
looses its proton. The calculations thus reproduce the order of
ionization of 12 and 119 as well as the magnitude of the pH shifts with
and without phosphate. The experimental pK shift of 1 to 1.2 pH with
added phosphate is closer to that found in the dynamic calculations
suggesting that the ion is lost when the two His are neutral.
Unstable calculated pKas for residues in a buried ion pair
His-48 has a measured pKa of 6.3 (Table 3). With MCCE the pKa is 2.5 pH units too high in the crystal structure 3RN3, whereas in NMR structures (2AAS) the average is 1.7 pH units too low with an RMS variance of 4.4 pH units. The variability of His-48 is coupled to changes in Asp-14 with which it has an interaction of
5
pK units when both are ionized. NMR
structures, which gave very different pKas for
these groups, were compared. When the Asp

Grxn is moderate and its
interaction with the backbone favorable it titrates at low pH and the
His remains ionized to very high pH (10 or greater). However, there are
structures where the Asp 
Grxn is
>9
pK units so it now stays neutral until pH 10. Here the favorable
His+:Asp
pair-wise
interactions are never important because the His titrates well before
the Asp. Thus, small shifts in Asp
