Department of Physiology and Biophysics, Mount Sinai School of
Medicine, CUNY, New York, New York 10029
An improved approach is presented for calculating
pH-dependent electrostatic effects in proteins using sigmoidally
screened Coulomb potentials (SCP). It is hypothesized that a key
determinant of seemingly aberrant behavior in pKa shifts is
due to the properties of the unique microenvironment around each
residue. To help demonstrate this proposal, an approach is developed to
characterize the microenvironments using the local
hydrophobicity/hydrophilicity around each residue of the protein. The
quantitative characterization of the microenvironments shows that the
protein is a complex mosaic of differing dielectric regions that
provides a physical basis for modifying the dielectric screening
functions: in more hydrophobic microenvironments the screening
decreases whereas the converse applies to more hydrophilic regions. The
approach was applied to seven proteins providing more than 100 measured
pKa values and yielded a root mean square deviation of 0.5 between calculated and experimental values. The incorporation of the
local hydrophobicity characteristics into the algorithm allowed the
resolution of some of the more intractable problems in the calculation
of pKa. Thus, the divergent shifts of the pKa
of Glu-35 and Asp-66 in hen egg white lysozyme, which are both about
90% buried, was correctly predicted. Mechanistically, the divergence
occurs because Glu-35 is in a hydrophobic microenvironment, while
Asp-66 is in a hydrophilic microenvironment. Furthermore, because the
calculation of the microenvironmental effects takes very little CPU
time, the computational speed of the SCP formulation is conserved.
Finally, results from different crystal structures of a given protein
were compared, and it is shown that the reliability of the calculated
pKa values is sufficient to allow identification of
conformations that may be more relevant for the solution structure.
 |
INTRODUCTION |
The electrostatic interaction between two charges
in a dielectric medium decreases rapidly as the separation between the
two entities is increased. This is a result of the dielectric medium's intrinsic ability to shield the electrostatic field arising from the
charge. There is compelling experimental and theoretical evidence that
the general functional form of this screening is sigmoidal. Early
experiments on organic acids (Debye and Pauling, 1925
; Schwarzenbach, 1936
; Webb, 1926
) were collected together and presented as mean values
in tabular form (Conway et al., 1951
) that produced a smooth, sigmoidal
screening as a function of separation with an asymptotic value of about
80. Experiments on water as a dielectric medium indicate that the
electrostatic field, at loci only two solvation layers (
6 Å) away
from a charge, is dramatically damped, almost approaching the bulk
dielectric screening of
= 80 (Harvey and Hoekstra, 1972
; Pennock
and Schwan, 1969
; Takashima and Schwan, 1965
). When two charges are in
close proximity, however, the dielectric screening rapidly diminishes
approaching the vacuum value because there is no intervening
dielectric. These conditions are simultaneously satisfied with a
radially dependent sigmoidal screening function. Additionally, it has
been demonstrated that the generalized Born-surface area continuum
solvation model also gives rise to sigmoidal screening (Mehler, 1996b
).
Thus, the general sigmoidal screening form applies to both high and low
dielectric media. Theoretically, the Lorentz-Debye-Sack (LDS) theory
of polar solvation also leads to sigmoidally screened electrostatic
fields (Debye, 1929
; Lorentz, 1952
; Sack, 1926
; Sack, 1927
). LDS theory
has been used to develop radial dielectric permittivity profiles to
describe ion and dipole field sources (Ehrenson, 1989
) and to derive an
expression for the hydration energies of spherical ions (Bucher and
Porter, 1986
) from the integral form of the Born Equation (Born, 1920
).
Subsequently, Ehrenson (1989)
showed that by using Bötcher's
analytic expression (Böttcher, 1938
) of Onsager's theory
(Onsager, 1936
), the reaction field corrected LDS theory for both
dipolar and asymmetric ion field sources also yielded sigmoidal
screening. (For a review of LDS theory and its relation to protein
electrostatics, see Mehler, 1996a
.) In various applications (Collura et
al., 1994
; Mehler, 1996a
; Mehler and Solmajer, 1991
), it has been shown
that sigmoidal screening leads to the reliable prediction of properties dependent on electrostatic effects in proteins, and therefore is an
attractive formulation for the development of implicit solvent models
(Guarnieri et al., 1998
; Hassan et al., 1999
).
Recently, a self-consistent approach based on using sigmoidally
screened Coulomb potentials (SCP) for calculating pH-dependent properties in proteins has been derived and applied to several systems
(Mehler, 1996b
). The results indicated that the reliability of the
method is similar to that achieved by the finite difference Poisson-Boltzmann (FDPB), but requires substantially less computing effort. Interestingly, although these various methods predict a
majority of the pKa shifts correctly, they were consistent
in the occasional dramatic errors on singularly important residues. The
divergent pKa shifts of the two titratable acids of hen egg white lysozyme, Glu-35 and Asp-66, are particularly problematic. The
divergent pKa shifts cannot be simply explained by the
buried fraction (Demchuk and Wade, 1996
), since the two titratable
groups are nearly completely buried. Thus, it appears more likely that the unique local environment around each residue is a key determinant for the pKa of Glu-35 to be two pK units higher than its
water value, whereas in Asp-66 the pKa is more than two
units lower. Ponnuswamy et al. (1980)
, showed that at a detailed level,
the local environmental hydrophobicity that surrounds each residue is
distinct and can be quite different from all other residues, implying
that a different screening would be required for each site (for PB
calculations, each site would require its own internal permittivity
value) as has been pointed out by Warshel (King et al., 1991
).
In this work we hypothesize that a key source of single residue
aberrations in pKa is a result of the special protein
microenvironment that exists around the singular residue. To test this
hypothesis, a strategy has been developed to quantitatively
characterize the hydrophilicity or hydrophobicity of the
microenvironment around each titratable group, and then use this
property to modulate the components of the electrostatic free energy
operative in proteins. The advantage of this approach is that by using
physicochemical properties independent of the parameters required to
calculate the electrostatic effects, one might gain additional insight
into the relation between protein structure and the forces that control its function. Accounting for every local microenvironment in the entire
protein structure could become prohibitively costly. Therefore, to
preserve the computational speed of the SCP, the quantitative calculation of the microenvironments has been implemented in a way that
requires virtually no additional CPU time.
In the next section of this paper the method for quantifying the
microenvironments is presented, and the dielectric screening function
used previously (Mehler, 1996a
; Mehler and Eichele, 1984
) is
reformulated to account for the local variation in the dielectric properties of proteins. The algorithm is applied to calculating the
pKa in bovine pancreatic trypsin inhibitor (BPTI),
triclinic hen egg white lysozyme (HEWL), ribonuclease A (RnaseA),
ribonuclease T1 (RnaseT), ribonuclease HI (RnaseHI), calbindinD9k (Cab)
and the B1 immunoglobulin G-binding domain of protein G (ProtG). These seven proteins provide a data set of 103 measured pKa
values that have been used to develop the parametrization of the
algorithm. Subsequently, the method is tested on turkey ovomucoid
inhibitor domain 3, HIV protease, and other crystal forms of some of
the proteins in the test set. Finally, the results obtained here are compared with other calculations and potential improvements for the
quantification of the microenvironments are considered.
 |
THEORETICAL BASIS AND CHARACTERIZATION OF MICROENVIRONMENTS |
Calculation of the electrostatic free energy
The calculation of the pKa of titratable groups in
proteins is greatly simplified by considering the thermodynamic cycle
(Bashford and Karplus, 1990
; Bashford and Karplus, 1991
; Warshel, 1981
)
|
(1)
|
where (p) and (s) refer to protein or solvent, respectively. Thus,
the pKa of group A in the protein can be calculated from its pKa in model solvent (water) and the additional changes
in free energy that arise when the titratable group is transferred from
water into the protein. Here, only the electrostatic contributions are
considered and pKa(p) is obtained from the relation
|
(2)
|
where wAint is the interaction free
electrostatic energy of the charged group in the field of all the other
groups in the protein, and
wAtr is the
change in self-energy on transferring the group from water into the protein.
In Mehler (1996b)
, it was shown that with an SCP of the form
|
(3)
|
where qj is the charge at point
rj, rj = |r
rj| and
D(r) is a screening function, the total electrostatic free
energy of a system consisting of M groups could be expressed as the sum
of the interaction term and a transfer contribution as
|
(4)
|
where A may refer to subgroups of each amino acid residue and
other groups contributing to the total charge of the system. As
discussed in the introduction, both experimental results and theoretical considerations indicated that D(r) should have a
sigmoidal form (for details, see the Appendix) as originally proposed
by Mehler and Eichele (1984)
.
For the calculation of the pKa, it is not necessary to
evaluate the total electrostatic free energy, but only the contribution from each titratable group in the field due to all the other groups in
the protein. In most pKa calculations the full titration
charge is placed at one or more fixed positions in the protonatable
moiety of the N titratable residues, and the equilibrium
charge state is calculated from the distribution of the charge over the
2N ionization microstates (Bashford and Karplus, 1991
;
Beroza et al., 1991
; Gilson, 1993
). In the procedure developed in
Mehler (1996b)
, the assignment of the ionization charge over the atoms of the protonatable moiety was determined variationally to allow it to
respond to the protein environment. To reduce computing time, the
direct determination of the 2N charge microstates was
bypassed by coupling the total variational ionization charge of each
group, at the given pH, to the Henderson-Hasselbalch equation. In this
paper the method presented in Mehler (1996b)
is still used, but has
been modified; the details of the modified approach are discussed
in the Appendix.
The calculations reported in Mehler (1996b)
were based on a single
screening function that had been parametrized earlier on the basis of
pKa shifts in bifunctional organic acids and bases (Dwds defined in Mehler and Eichele, 1984
). Thus, the only
parameter introduced in Mehler (1996b)
was a scaling of the interaction energy of nearby charges. To improve and generalize the approach followed in Mehler (1996b)
, it is hypothesized that the screening for
groups deeply buried in the protein should be less than for solvent
exposed groups. This reflects the commonly accepted view that the
protein interior is more hydrophobic than water. To implement this
hypothesis, two new screening curves were constructed by redefining
D0 and
(see Appendix, Eq. A1) so that, for small
separations, one curve would provide more screening than
Dwds, used in Mehler (1996b)
, whereas the other would
provide less screening. These two screening functions, D1
and D2, as well as the screening function in water,
Dw, are shown in Fig. 1
(D3 will be discussed below) and the values of
D0 and
are given in Table
I. Although these two screening functions
appear to be very similar, for small values of r (see inset,
Fig. 1) the energy is quite sensitive to small changes in the
screening. Thus, at r = 2 Å, the difference between the two curves for unit charges amounts to nearly 4.5 pK units, while,
at r = 3 Å, the difference is 1.4 pK units. In
contrast, at r = 8 Å, the difference is only 0.09 pK
units and continues to decrease with increasing r. Thus, for
small separations between groups, the contributions to the interaction
energy are quite sensitive to which screening function is used. This is
also the case for the transfer energy, since the Born radii,
Ra in Eq. A2, lie between about 1.2 Å and 2.6 Å. Which
screening function is used for a given titratable group depends on the
fraction of the group's solvent accessible surface area that is buried
in the protein, i.e., the buried fraction. When these two screening functions are incorporated into the calculation, there was some improvement in the pKa values, but the pKa of
Glu-35 in lysozyme was still too small by more than one pK unit.

View larger version (22K):
[in this window]
[in a new window]
|
FIGURE 1
Screening functions defined in Table 1. Solid
line, Dw; short dashed line,
D1; long dashed line, D2; and
dots, D3. The inset is a detail from 0.5 to
3 Å. (See the Appendix for details).
|
|
Quantification of the microenvironment
Analysis of the local environments around Glu-35 and Asp-66 in
lysozyme shows that for the former the nearest neighbors are largely
hydrophobic residues, whereas for the latter residue the nearest
neighbors are much more hydrophilic. The increased hydrophobicity around Glu-35 has consequences for both components of the electrostatic free energy: the energy transfer term will become more positive (more
unfavorable), which can be effected by a decrease in Rp
(see Eq. A6). The screening of the Coulomb potential should also decrease, especially for interactions with nearby atoms so that they
are larger in magnitude relative to a less hydrophobic region. However,
this will be partially offset by the fairly small partial charges on
atoms in hydrophobic groups. Since wint is
usually negative, whereas wtr will be positive
for hydrophobic regions, the resulting shift in pKa will
depend on the subtle balance between these two quantities. The transfer
of Asp-66 from water to a relatively hydrophilic environment, where
Dp(Rp) is larger, will result in a smaller
transfer energy (less unfavorable; it could be negative if the local
environment is more hydrophilic than water), and the Coulombic
screening should be greater. These considerations clearly suggest that
the relevant quantity for characterizing the local environment, here
called the microenvironment, is the hydrophobicity or hydrophilicity
(here we use Hpy to refer to these properties and their values in
particular cases) of the residues or their fragments near the given
titratable group.
There are a large number of Hpy scales available for evaluating the
hydrophobicity of amino acid residues (Cornette et al., 1987
).
Generally these scales have been used to evaluate the hydrophobicity of
a particular residue, or at least its side chain. Here, the hydrophobicity of the particular residue is not required, but instead,
because the response of the residue to changes in the local environment
is the quantity of interest, it is a quantitative measure of the
hydrophobicity/hydrophilicity of the microenvironment that is needed. A
hydrophobicity scale based on atoms or small fragments is most
appropriate for the algorithm proposed here because it allows the
microenvironment to be defined in a simple, distance dependent way. In
addition, this type of scale can account for the fact that side chains
generally do not have a single Hpy value along their entire length, nor
do all the atoms of a side chain necessarily belong to a given
microenvironment. A further important advantage of a fragment-based
scale is that the presence of coenzymes, prosthetic groups, ligands, or
nucleic acids can all be accounted for with a consistent scale. From
these considerations, the Rekker Fragmental Hydrophobic Constants
(Rekker, 1977
; Rekker, 1979
) were selected to evaluate the
hydrophobicities of the microenvironments in the present calculations.
The Rekker scale was the first fragmental system developed and is based
on assigning hydrophobicity parameters and correction factors to small
functional fragments to calculate log P values (for details, see
Rekker, 1977
). This scale has been widely used in the pharmaceutical
industry as a means to calculate partition coefficients of potential
drug candidates. Here, we use it not to obtain the
hydrophobicity/hydrophilicity of a particular residue, but to obtain
this quantity for the local protein environment. This is done by (1)
identifying fragments of all the residues that are in the neighborhood
of the titratable residue, (2) assigning the Rekker coefficients to
these fragments, (3) summing them to determine the local
hydrophobicity/hydrophilicity of the region around the titratable
residue, and (4) assigning an appropriate screening for this
environment. Three other fragmental systems have been proposed, of
which two are based on functional fragments (Leo et al., 1975
; Suzuki
and Kudo, 1990
) and one is atom based (Ghose and Crippen, 1986
). All
the fragmental systems that have been proposed to date are calibrated
on the basis of partition coefficients (log P). The parameters are
therefore completely independent of protein structural information.
Thus, their successful use to describe local protein environments will
give some indication of the universality of the hydrophobic concept and
the transferability of scales developed from different physical
concepts. A recent report compared the four methods and showed that the
Rekker system was one of the most accurate available (Mannhold et al.,
1995
).
In this initial application the original values of the hydrophobic
fragmental constants and correction factors were used (Rekker, 1977
).
For each fragment the values were partitioned to the atoms belonging to
that fragment, and to simplify the calculations, net charge on a
specific group was not accounted for. Then the microenvironment can be
defined on the basis of a distance criterion, and only atoms satisfying
it are included in the calculation of the total Hpy around the given
group. The initial set of values that have been used in the
calculations are tabulated in Table 2.
Note that the water value was estimated from the fragmental constants
of aliphatic OH and Hneg (Rekker, 1977
).
The microenvironment of any group is defined as the region that lies
within a sphere of radius 4.25 Å from any (nonhydrogenic) atom
belonging to the group. This radius was chosen because it essentially
includes the first shell around each atom, but does not extend to atoms
beyond this. In addition, this value is in reasonable agreement with
the criterion used by Ponnuswamy et al. (1980)
. In analogy with the
formula used to calculate log P from the fragmental hydrophobic
constants (Rekker, 1977
), the Hpy value for each microenvironment of a
group is calculated from the formula
|
(5)
|
In Eq. 5, RFHCb is the contribution of atom b to the
fragment's fragmental hydrophobic constant as given in Table 2,
NA is the number of atoms in group A and
to ensure that each atom in the microenvironment is counted only
once. Schematic representations of the microenvironments around Glu-25
and Asp-66 in lysozyme are presented in Fig.
2. These diagrams show that in most
cases only some atoms of a given amino acid residue are located within
4.25 Å of one or more atoms of the group in question. Moreover, these
arrangements of the atoms in space clearly show how the protein
architecture has evolved to create the hydrophobic microenvironment
around Glu-35 and the hydrophilic microenvironment around Asp-66.

View larger version (21K):
[in this window]
[in a new window]
|
FIGURE 2
Residues defining the microenvironments of Glu-35 and
Asp-66 in HEWL. The number under each residue is its contribution to
the total hydrophobicity or hydrophilicity. Atoms within 4.25 Å of any
atom of the titratable moiety (shown in bold) are labeled. See text for
details.
|
|
In addition to the Hpy of the protein microenvironments, the Hpy in the
solvent are needed to calculate the gain or loss of hydrophilicity in
the solvent to protein transfer process. These were calculated by
immersing each titratable residue in an equilibrated water droplet and
carrying out 100 ps of dynamics after heating to 300 K. The residues
were capped with neutral fragments at the termini. From the interval
70-100 ps, 31 structures were extracted, an Hpy calculated for each
structure using the water fragment values given in Table 2, and the
final value was obtained from the average of the Hpy values calculated
from the 31 structures using Eq. 5. The results for the protonatable
residues are given in Table 3 and the
atoms belonging to the titratable groups, as defined in the PAR19
parameter set of CHARMM (Brooks et al., 1983
), are also shown. It
should be noted that in analogy to Rekker's (1977)
formula for
calculating log P from the hydrophobic fragmental constants, Hpy as
defined in Eq. 5 is an extensive quantity. Therefore, in an environment
that is uniformly hydrophobic or hydrophilic its magnitude will tend to
increase with increasing size of the group. In addition, basic groups
will tend to orient the waters with the dipole pointing away from the
titration site (oxygen pointing toward the group), which will tend to
increase the hydrophilicity of the microenvironment, whereas for the
acidic groups the dipole will tend to orient toward the titration site
(hydrogens nearer the acidic group) tending to increase the
hydrophobicity.
Three measures for characterizing the hydrophobicity of the
microenvironments are defined: first is HpyA defined in Eq. 5, second is the total hydrophobicity of the microenvironment due to
the fraction of the group buried in the protein and the fraction exposed to solvent
|
(6)
|
where HpyA0 is the value of Hpy in the solvent,
and finally, a measure of the hydrophobicity of a given group's
microenvironment relative to what it would be if the group was totally
immersed in water, i.e.,
|
(7)
|
These quantities have been calculated for all proteins in the data
set, and average values, calculated from the seven proteins in the data
set for each type of titratable residue, are given in Table
4. It is of interest that the buried
fractions and microenvironments are quite different for the different
residue types. The reciprocal of rHpy is an indication of
the increase in the environmental hydrophobicity on transferring the
titratable group from solvent to its final position in the protein.
Thus, for the lysines, this increase is only about 1.5, which is not
surprising considering the small value of
. Histidine is
the second most buried residue, but on the average, its
microenvironment appears to be quite hydrophilic, and the increase in
hydrophobicity is less than two. The titratable moiety of Tyr is the
most buried with the most hydrophobic microenvironments. In an initial
more extended study of the microenvironments of all groups in a sample
of 13 globular proteins comprising a total of 7132 groups, it was found
that
and
had values of 0.81 and
0.43, respectively (Mehler, unpublished results). Therefore, the
average increase in hydrophobicity is about 2.3, which compares well
with the value of 2.6 obtained in an earlier study by Ponnuswamy et al.
(1980)
using very different methods and assumptions.
The buried fractions (BF) and microenvironment properties of each
titratable moiety in lysozyme are given in Table
5. It is seen that only a few values of
Hpy are greater than zero. However, as will be discussed below, it was
found that titratable groups with a BF less than 0.7 should be
considered as solvent exposed for the pKa calculations.
Considering groups with BF > 0.7, only Glu-35 has a positive
value of Hpy. The values of THpy or rHpy provide additional
insight into the change in the hydrophobicity of the microenvironment
when the group is transferred from water to protein. In HEWL, THpy is
always negative and the highest value is
0.59 for Glu-35; the next
largest value is
1.99 for Tyr-53. Comparison of the microenvironment
of Glu-35 with the average values and standard deviations for Glu
(Table 4) also clearly exhibits the unusual nature of the local
environment around Glu-35.
Considering rHpy, the smaller the value the more hydrophobic
the protein microenvironment of the group relative to water. Thus, the
value of Glu-35 is the smallest (it implies that its protein
microenvironment is around 20 times more hydrophobic than water!) in
HEWL, and the next value is five times larger. Comparison of the
microenvironments of Glu-35 and Asp-66 shows how the structural characteristics (see Fig. 2) translate into the quantitative
description of the differences in hydrophobicity between the two
microenvironments provided by the three hydrophobicity parameters. In
HEWL, only Asp-52 has an Hpy value slightly more negative than Asp-66.
Thus, the protein contribution to the microenvironments of these two residues are the most hydrophilic in HEWL, while that of Glu-35 is the
most hydrophobic. It should finally be noted that the values of
rHpy can be negative, indicating extreme hydrophobicity, or greater than one, indicating a microenvironment more hydrophilic than water.
Parametrization and the microenvironment
The hydrophobicity measures of the microenvironments in lysozyme
(Fig. 2, Table 5) reveal a complex mosaic of different dielectric regions that present determinants of electrostatic properties that are
quite independent of the buried fraction. In several important cases a
totally buried residue, which usually is considered to exist in a low
dielectric region, is shown to actually be found in a quite hydrophilic
microenvironment. Thus, transfer of such a residue from water to
protein is, in fact, only a small perturbation because the screening in
such a microenvironment of the protein will be close to that in water.
Nevertheless the microenvironments around the titratable groups of
proteins show that they are generally more hydrophobic than in water
(compare Tables 3, 4, and 5).
The values of D0 and
used to define D1 and
D2 were determined as described above to account for the
expected differences in dielectric behavior depending on the extent of
burial, but the explicit values were not calibrated against the
pKa of the protein data set. To incorporate these two
screening functions into the algorithm, a switching value had to be
determined, which was accomplished by assigning D1 to all
protonatable groups with a BF below a given threshold and
D2 for the others. A value of 0.7 was found to give the
lowest root mean square deviation (rmsd) between calculated and
experimental pKa values in the data set and is used for
switching between D1 (BF < 0.7) and D2
(BF
0.7) for all subsequent calculations. The calculations
carried out in this way do not consider the explicit effects of the
microenvironments and therefore are labeled as "menv-free"
calculations in the following discussions.
From Fig. 2 and Tables 3, 4, and 5, it is apparent that for individual
residues there can be considerable variation around the mean values of
the properties characterizing the microenvironments. It seems
reasonable, therefore, to make use of this to identify microenvironments that exhibit especially large deviations from the
mean, and use the physical implications of these deviant regions to
modulate the electrostatic properties of the titratable groups in
question. To incorporate this idea into the algorithm, it is necessary
to define quantitatively when the properties of a microenvironment deviate enough from the mean values to require modification of the
menv-free assignments of the screening functions given above. These
parameters were determined in the following way. The microenvironments of groups for which the calculated pKa showed the largest
errors were analyzed to rationalize on physical-chemical grounds the sources of the errors and the changes in screening that might lead to
improvement. However, only reduction in overall rmsd was used to decide
if a particular combination of screening functions was to replace the
default assignments depending only on the BF value. This approach was
taken because of the observation from the comparison of results using
different crystal forms that a particular large error may be due to a
conformation that is not relevant for the solution structure (see
below) or that the sources of a particular error may be due to one or
more inadequacies of the method. For these reasons, and because the
results indicate that the quantification of the microenvironments will
need further extension and refinement, exhaustive optimization was not
carried out. Modification of the default screening was applied to
titratable groups with BF
0.7, where the hydrophobicity or
hydrophilicity of the microenvironments was far from the mean value;
the border region with 0.7
BF
0.8 was treated
differently from groups with BF > 0.8. For groups with very small
buried fractions (BF < 0.3) and microenvironments very close to
water, a third screening function was introduced (D3 in
Table 1 and Fig. 1) that is much closer to Dw than
D1 or D2. Table 6
gives the thresholds for which the default conditions are modified for
calculating the electrostatic energies, and they are discussed in the
following paragraphs.
The protonatable moiety of Glu-35 in lysozyme is a buried titratable
group in a very hydrophobic environment (see Fig. 2 and Table 5).
According to Rekker (1977)
, Table IX, 1, p. 301), amino acid residues
for which the fragmental hydrophobic constants sum to a relative
hydrophobicity greater than one, are considered lipophilic, and this
value has been used here to define an extremely hydrophobic
microenvironment. As mentioned above, only a few buried tyrosines are
in more hydrophobic microenvironments than Glu-35, but their
pKa values have not been measured. The effect of very hydrophobic environments is to reduce screening and to make the transfer of the titratable group from water to the protein more unfavorable. Since the pKa has been measured only for
Glu-35, any parameter adjustment is arbitrary, so that in these cases (see Table 6)
wtr has been scaled to increase
the energy penalty for transferring a charge from solvent to the
protein interior.
The average value of Hpy for the microenvironments surrounding the
titratable residues in the seven proteins of the data set is about
1.5, but, as noted (see Table 4), the variations around this mean are
considerable. For Asp and Glu when BF > 0.7, the microenvironments of a fairly large number of them are considerably more hydrophilic than
1.5. For such cases, the screening
D2 may imply a microenvironment that is too hydrophobic so
that the large, unfavorable transfer energy results in pKa
values that are too high for the acids and too low for the bases. For
14 of the 53 aspartic and glutamic acids in the data set, this was the
case, and the overestimation of these pKa values was
corrected by using D1 instead of D2 for
calculating
wtr, because this implies
transfer to a more hydrophilic microenvironment. For tyrosine and the
bases the number of cases meeting the conditions of burial with BF
greater than 0.7 and surrounded by very hydrophilic microenvironments
was too small to assess any trends. It was found empirically, however,
that using D1 instead of D2 for calculating wint gave a small improvement in the overall
rmsd. This result may be fortuitous because it appears to be
counterintuitive. At the same time the response of any titratable group
to its environment is determined by the balance between
wint and
wtr, so that,
as the differences in properties given in Table 4 suggest, this
response may depend on residue type. Finally, the third screening
function, D3, is only used when the properties of the
microenvironment indicate conditions that are very close to pure solvent.
Computational details
The calculations reported below were carried out using protein
coordinates obtained from the protein data bank (Bernstein et al.,
1977
) as follows: BPTI:4PTI* (Marquart et al., 1983
), HEWL:2LZT*
(Ramanadham et al., 1981
) and 1HEL (Wilson et al., 1992
), RnaseA:3RN3*
(Howlin et al., 1989
), RnaseT:3RNT* (Kostrewa et al., 1989
),
RnaseHI:2RN2* (Katayanagi et al., 1992
) and 1RNH (Yang et al., 1990
),
Cab:3ICB* (Szebenyi and Moffat, 1986
), protG:1PGA* and 1PGB (Gallagher
et al., 1994
), turkey ovomucoid inhibitor domain 3(OMT3):1PPF (Bode et
al., 1986
), and hiv protease (1HIV) (Thanki et al., 1992
), where the
starred coordinate sets were used in the optimization procedure. The
preparation of the coordinates was carried out in the same way as in I,
and the values of the model pKa used here are
N-term, 7.5; C-term, 3.8; His, 6.3; Glu, 4.4; Asp, 4.0; Tyr,
10.0; Lys, 10.4; and Arg, 12.0. The partial charges and group
structures of the amino acid residues are those defined in the PAR19
topology file of CHARMM (Brooks et al., 1983
), whereas the partial
charges of the neutral forms of the titratable groups were taken from
the PAR22 (MacKerell et al., 1992
; MacKerell et al., 1995
) topology
files. The (BF)a of Eq. A6 were evaluated from the
solvent-accessible surface areas calculated with CHARMM using a probe
radius of 1.0 Å, and the group average (defined by CHARMM) was used to
scale all the
watr for the given group
(see Table 5 for the HEWL values). The Born radii were determined in
the same way as described in the appendix of Mehler (1996b)
except that
atomic values were calculated, and the value obtained for
Dw was used in both self energy terms in Eq. A6. For
charges closer than 1.9 Å, wint was scaled by a
factor of 0.35. This scaling of nearby charges is similar to that used
in Mehler (1996b)
and compensates for the breakdown of the point charge
model for nearby charges. However, due to the modified variational
procedure (see Appendix) the cutoff distance could be reduced from 2.8 Å in Mehler (1996b)
to the present value. Therefore, the scaling is
essentially only used where uncompensated steric clashes leave two
charges too close together. In addition, ionic strength is taken into
account using a Debye screening as in Mehler (1996b)
. Finally, the
polar protons of histidine are treated symmetrically in assigning the
charge of the neutral species.
 |
RESULTS AND DISCUSSION |
Calculation of pKa
The calculated pKa for the test suite of seven
proteins are presented in Tables 7-13.
To evaluate the effect on the pKa of introducing the
dependence of the electrostatics on the microenvironment, the
pK1/2 calculated with the screening functions
D1 and D2 only, i.e., menv-free are also
tabulated. Calculations were carried out in steps of 0.25 pH units and
the pK1/2 of each titratable group was estimated by linear
interpolation between the two pH values where the fraction of charged
species crossed the 0.5 value. The ionic strength was adjusted to be as
close as possible to the experimental conditions and is 0.1 except
where stated otherwise. For most of the proteins in the data set the
experimental pKa values to be used for calculating rmsd are
clear from the quoted reference, but in some cases more than one choice
is possible, and these will be noted as required for clarity. Finally,
the Null model introduced by Antosiewicz et al., (1994)
assumes that the pKa values in the protein are the same as in the model
(solvent), and it was noted that the rmsd for this model is usually
quite small, since most of the pKa values do not shift much
from their model values. Because of this the rmsd resulting from the
calculations provide only a partial measure of reliability. Of equal or
greater importance are the number and size of the large errors, and
these will also be considered in assessing the reliability of the
approach.
Bovine pancreatic trypsin inhibitor
The titratable groups in BPTI are mostly solvent exposed, thus the
pKa values are not shifted far from the null values (see Table 7). For only three residues (Arg-20, Tyr-23, and Tyr-35) are the
BFs in the protein greater than 0.7. Tyr-23 is most noteworthy because
the microenvironment is very hydrophilic (Hpy =
5.4), and
D1 was used to screen the SCP (see Table 6) causing an
increase of about 0.5 pK units in pK1/2, which reduces the
error to about
of its menv-free value. The rmsd for BPTI is
quite small in agreement with other calculations (Antosiewicz et al.,
1994
; Demchuk and Wade, 1996
; Juffer et al., 1997
).
Calbindin D9k
In Cab 9 of the 10 lysine residues are exposed to solvent, so that
the microenvironments are less significant in modulating their
electrostatic effects. The results given in Table
8 show that the calculated
pKa values of all 10 lysines are in good agreement with
experiment, and the rmsd is slightly smaller than that calculated from
the menv-free conditions. Inspection shows that small shifts in several
p/Ca values lead to the decrease in rmsd with the largest shift of 0.2 pK units exhibited by Lys-12.
The authors of the NMR measurements (Kesvatera et al., 1996
) also used
a Monte Carlo method (Svensson et al., 1990
; Svensson et al., 1991
) to
calculate the pKa of the lysine residues. However, the
resulting rmsd of 0.86 is larger than the null model value of 0.74. Juffer et al. (1997)
calculated the pKa values using a
boundary element method to solve the Poisson-Boltzmann equation. In
agreement with other studies (Antosiewicz et al., 1996
) they found that
with a low internal dielectric constant (
i) the results deviated substantially from the observed values. With an
i = 20, agreement was much better (rmsd = 0.51).
Using the value of the solvent's dielectric constant for
i gave an rmsd of 0.36 for both zero and 0.1 ionic
strength. Except for Lys-55, the other lysine residues are exposed to
solvent so that the results of Juffer et al. (1997)
, that high values
of
i yield better results, is not unexpected. It is
noteworthy that the pKa values of Lys-25 and Lys-55 are
shifted upward one or more pKunits. The BF of these two residues is
about 0.5 and 0.8, respectively, so that the upward shift in
pKa shows that the interresidue interactions can be as important as the transfer terms in accounting for large changes in
pKa from the Null model values.
Hen egg lysozyme
The main problem with HEWL is proper calculation of the
pKa of Glu-35, and, at the same time, to obtain reasonable
values for the pKa of the other titratable residues. The
source of the difficulty is the high hydrophobicity of the
microenvironment of Glu-35 as discussed above. Quantification of the
hydrophobic characteristics of the microenvironments reveals why
solution of the PB equation requires a low
i to give a
reasonable result for the pKa of Glu-35, but for many other
pKa such a low value causes large deviations from the
experimental results. With a larger value of
i, this
trend is reversed (Antosiewicz et al., 1994
).
The pKa values for HEWL are given in Table
9, and from the rmsd values it is seen
that introduction of the microenvironments leads to substantial
improvement. A large portion of this improvement is due to Glu-35, but
it is seen that the errors in several other pKa values are
also reduced, e.g., Tyr-53, Lys-96, and His-15. Other residues show
smaller improvements, and a few pKa values show slightly
larger errors. Glu-7, which deviates by more than one pK unit from the
experimental value, is only 27% buried and not influenced by the
microenvironment. This problem is further discussed in the next
section.
Demchuk and Wade (1996)
attempted a local characterization for
identifying a subset of residues described by a high internal dielectric constant (HD) and a set to be described by a low value (LD).
For HEWL, their overall rmsd was 0.63 as compared to 0.49 obtained
here. The HD subset (see Demchuk and Wade, 1996
) gave an rmsd of only
0.37, while here it was 0.50.
The titration curves of Glu-35 obtained from the menv-free calculation
and with inclusion of the microenvironments are shown in Fig.
3. Both curves titrate over a fairly
extended pH range. The net effect of the hydrophobic microenvironment
(scaling of wtr by 1.75) is to decrease the
slope of the titration curve thereby shifting pK1/2 to a
larger value and further extending the titration range, so that between
a pH of 4 and 6 the equilibrium charged fraction changes from about
0.25 to 0.55. This extended titration range of Glu-35 may have
functional significance because it ensures that an ample concentration
of lysozyme with protonated Glu-35 is available over a fairly large pH
range around the optimal value (Fukamizo et al., 1983
). The dependence
of the interaction energy and transfer energy on pH is presented in
Fig. 4. The effect of scaling is
clearly seen in wtr for pH values greater than
about eight. In this region, wtr continues to
increase, only starting to level off at very high pH. In contrast, the
transfer energy from the menv-free calculation levels off around a pH
of 10. The behavior of the energy components in the important pH range
4 to 6 is surprising. Here wtr and
w0tr are nearly identical, in spite of the
scaling factor, but wint is less negative than
w0int. From Eq. A6,
wtr (q = 1) has a value of 6.8 Kcal/mol and scaling by 1.75 increases this to 12.0 Kcal/mol. At a pH
of 6 the menv-free calculation gives q35 =
0.673, yielding 3.1 Kcal/mole for the transfer free energy.
Interestingly, the equilibrium value of the scaled
wtr is 3.2 Kcal/mol, which is achieved by
reducing the menv-free value of q35 to
0.514.
In this case the variational procedure lowers the net charge to
minimize the unfavorable transfer contribution with the consequence
that the pK1/2 value is increased. The effect on the
interaction energy is a small increase of 0.3 Kcal/mol (see Fig. 4).

View larger version (16K):
[in this window]
[in a new window]
|
FIGURE 3
Glu-35 of HEWL titration curve. Closed
circles, menv-free calculation; open circles, includes
microenvironment.
|
|

View larger version (19K):
[in this window]
[in a new window]
|
FIGURE 4
Energy components of Glu-35 as a function of pH.
Closed symbols, transfer energy; open symbols,
interaction energy; diamonds, menv-free calculation;
circles, complete calculation.
|
|
Immunoglobulin G-binding domain B1 of Protein G
The pKa values of this small protein (56 amino acid
residues) were recently reported by Khare et al. (1997)
, and the
calculated results are reported in Table
10. The microenvironments have very little effect on the calculated pKa because only 3 of the
21 titratable residues are buried (Tyr-3, Tyr-45, and Glu-56). Hpy of
Glu-56 is 0.93 and rHpy is 0.10, so that the
microenvironment of this residue is quite hydrophobic, but not as
hydrophobic as for Glu-35 of HEWL. Comparison with the calculated
pK1/2 given by Khare et al. (1997)
shows reasonably good
agreement in the overall trends of the deviations.
Ribonuclease A and Ribonuclease T1
Results for RnaseA are given in Table
11. Introduction of the
microenvironments leads to overall improvement in the calculated pKa. His-48 shows the biggest improvement when the
environmental effects are incorporated while the error in the
pKa of Glu-2 also decreases substantially. The overall rmsd
obtained here was 0.55, whereas Demchuk and Wade (1996)
obtained 0.63. For the subset labeled HD, they obtained an rmsd of 0.44, whereas here,
an rmsd of 0.60 was obtained. These results from HEWL and RnaseA
suggest that the HD selection criterion proposed by Demchuk and Wade
(1996)
can successfully select a subset of titratable residues with
high
i, but that a single smaller
i for
the low dielectric residues is not sufficient for discriminating the
differences in the microenvironment of these residues.
The results of the pK calculations for RnaseT1 are presented in Table
12. The titratable moiety of His-92
exists in a fairly hydrophobic microenvironment, which shifts the
pKa by about 0.6 pK units, reducing the error by about
half.
Ribonuclease HI
As shown in Table 13,
incorporation of the microenvironment improves the pKa
values of Glu-32 and Asp-148. Glu-32 is about 71% buried, but in a
fairly hydrophilic environment, with Hpy and rHpy having
values of
3.45 and 0.48, respectively. Thus, the menv-free screening
of D2 for the transfer term is changed to D1,
but D2 is still used for the interaction term. The effect of this change in screening is to decrease both transfer and
interaction energies leading to a decrease in pK1/2.
Similarly, Asp-148 is 97% buried, but with an Hpy of
4.4 is still in
a hydrophilic environment, so that the screening used to calculate
wtr is set to D1. The effect of the
increased screening is a reduction in wtr
leading to a decrease of 1 pK unit in the calculated pK1/2.
Nevertheless, the remaining error for this residue, as well as Asp-102,
is still greater than 1 pK unit. Oda et al. (1994)
pointed out that
these two residues did not titrate in the pH range of 2 through 8. Both residues form ionic H-bonds with basic residues (Katayanagi et al.,
1992
). Oda et al. (1994)
have also calculated the pKa
values of the acidic groups using the FDPB method (Klapper et al.,
1986
; Warwicker and Watson, 1982
). Their rmsd of about 1.6 was obtained with an internal dielectric constant of 10; their calculation would
probably have been considerably improved if they had used an
i of 20 (Antosiewicz et al., 1996
).
The calculated pKa of His-124 is too low by more than 1 pK
unit. However, as Oda et al. (1994)
note, comparison of the 2RN2 (Katayanagi et al., 1992
) crystal structure (used in the present calculations) and the 1RNH (Yang et al., 1990
) crystal structure shows
a very large conformational change for His-124 in the two structures.
In 2RN2, His-124 is far away from both catalytic residues, Asp-10 and
Asp-70, but close to a Lys from a neighboring molecule (Oda et al.,
1993
), which would decrease the pK1/2 of His-124 as found
here. In contrast, in 1RNH His-124 is close to Asp-10 and Asp-70, and
in this configuration His-124 would be in an acidic environment that
would increase its pKa as observed; this point will be
discussed in a following section.
Turkey ovomucoid third domain and HIV protease
As a test of the microenvironment selection criteria developed
using the above proteins, the pKa values of the 15 titratable groups in OMT3 have been calculated. The experimental
titrations (Schaller and Robertson, 1995
) were carried out at a nominal
ionic strength of 10 mM, but the authors point out that, at the end of
the titration, the ionic strength is closer to 50 mM. pKa
calculations were done at ionic strengths of 10 mM, 25 mM, and 50 mM.
The smallest rmsd was obtained at 25 mM and these results are reported
in Table 14. Results calculated at 1 M
ionic strength are also included since Schaller and Robertson (1995)
also measured the pKa values at this ionic strength.
The remarkable aspect of this protein is that, in spite of its small
size, the pKa values of four of the carboxy groups are shifted downward by one or more pK units, although all the acidic groups except Asp-27 are solvent exposed. The source of these large pK
shifts is attributed to H-bond formation and other electrostatic interactions (Swint-Kruse and Roberson, 1995
). From Table 14 it is seen
that the overall rmsd agrees well with the values obtained for the
other proteins. Only Asp-27 is in error by a larger value, which is in
agreement with the trend found by Antosiewicz et al. (1996)
. At 1 M
ionic strength the rmsd is somewhat greater, but the overall trend to
larger pK1/2 values as observed from the measurements is
preserved. The larger rmsd may be the result of using a simple Debye
screening to account for ionic strength. This approximation is probably
reasonable in the range up to 200 mM, but for 1 M probably is no longer valid.
As a second test the pKa values of HIV protease have been
calculated, but only the results for the Aspartyl dyad are discussed. The experimental data show a range for both pKa1 (4.9-6.8)
and pKa2 (3.1-3.7) (Hyland et al., 1991
; Ido et al.,
1991
). The calculation was carried out at zero ionic strength and
yielded the values 5.45 and 3.32 for pKa1 (Asp-25A) and
pKa2 (Asp-25B), respectively. Thus the calculated values
appear to fall within the experimentally determined range, and they are
also in reasonable agreement with the recent calculations reported by
Luo et al. (1998)
. The fraction surface area buried in the protein for
Asp-25A is 0.88, while for Asp-25B it is 0.85. Thus, as in lysozyme,
the separation of the two pKa values cannot be due to
differences in the degree of burial. However, because of small
differences in conformation of the two monomers, the microenvironments
of the two groups are different with values for Hpy of
1.72 (Asp-25A)
and
3.97 (Asp-25B). With these values, wint
and
wtr of both groups are calculated with
D2 (see Table 6). Nevertheless, the lower Hpy value for
Asp-25B suggests additional interactions with polar groups leading to
enhanced stabilization of the deprotonated form. Finally, in the 1HIV
structure, Asp-25B is able to form a better H-bond with Gly-27B
[R(OD1
N) = 2.81 Å] than is the case for Asp-25A
[R(OD1
N) = 3.23 Å].
Assessment of the results
Table 15 summarizes the
reliability of the SCP approach using the description of
microenvironments developed in this paper. Incorporation of the
modulating effects of the titration site microenvironments on the
electrostatic free energy leads to an improvement of about 15% in the
rmsd relative to the menv-free calculation for the entire data set of
103 measured pKa values. It should be realized, however,
that with two different screening functions some accounting of
environment, based on the buried fraction only, has already been
included in the menv-free calculations. Therefore, the rmsd is smaller
than that obtained by Antosiewics et al. (1994)
from their FDPB
calculations, but closer to the overall rmsd obtained by Demchuk and
Wade (1996)
. More significant is the decrease in the larger errors due
to inclusion of the environmental effects: at the error threshold of
0.5 pK unit, the decrease is 17% and increases to 33% at a
threshold of 1.0 pK unit. A scatter plot of the pKa values
from both menv-free and microenvironment calculations is given in Fig.
5. The regression line has a slope of
0.98 and an intercept of 0.21. The correlation coefficient was 0.99. The scatter plot shows that accounting for the microenvironment induces
shifts in the points so that they lie closer to the ideal line
pKcalc = pKexp.

View larger version (18K):
[in this window]
[in a new window]
|
FIGURE 5
Scatter plot of calculated versus experimental
pKa values. Open triangles, menv-free results;
closed circles, complete calculation.
|
|
An error analysis according to residue type is given in Table
16. With the present data set the
largest errors were found in the pKa values of Asp and Glu,
with the bases having smaller errors; for Tyr and the termini not
enough data is available to be statistically meaningful. The error
trends from the present calculations are somewhat different from the
trends noted in Antosiewicz et al. (1996)
, where the rmsd from all the
residues were about the same.
The mean error of Asp shows that the calculated pKa are too
large. This trend is clearly seen in Fig.
6, where the experimental (Tanford and
Roxby, 1972
) and calculated titration curves of lysozyme have been
plotted. The overestimation of the acidic pKa values results in the slope of the calculated curve being too steep in the pH
region 2-4. In the pH interval 4-6, the titration curve is primarily
controlled by His-15 and Glu-35 that both titrate in this interval. The
pK1/2 points of the calculated titration curves of these
two residues (see inset, Fig. 6) are shifted relative to the measured
values (Bartik et al., 1994
) such that their equilibrium charges
approximately cancel in the pH interval 4.5-5.5 and lead to a flatter
slope in this portion of the calculated titration curve. Inspection of
the slope of the His-15 experimental titration curve suggests that it
is steeper than the calculated slope (Bartik, K., C. Redfield, and
C. M. Dobson, private communication). Finally, from a pH of 6 to 11, the experimental and calculated curves are in good agreement.

View larger version (21K):
[in this window]
[in a new window]
|
FIGURE 6
Calculated (open circles) and experimental
(closed circles) titration curves of lysozyme. Inset shows
calculated titration curves of His-15 and Glu-35.
|
|
Limitations of pKa calculations using a single
structure
In their analysis of pKa values calculated with the
FDPB algorithm, Antosiewicz et al. (1996)
included average values
obtained from NMR structures, and they noted a trend toward
improvements in the results. The use of simulation to incorporate
protein flexibility (relaxation effects) into the calculation of
protein electrostatics has been addressed by Warshel and collaborators
using linear response theory (Sham et al., 1997
; Sham et al., 1998
).
Other approaches have been reported (Alexov and Gunner, 1997
; Beroza
and Case, 1996
), and these authors also found some improvements in the
calculated values. However, at present not enough systems have been
tested to come to any general conclusions. Variability in the
conformation of side chains was already pointed out by Bashford and
Karplus (1990)
and others (Yang et al., 1993
; You and Bashford, 1995
) in pK calculations using the trigonal and tetragonal crystal forms of
HEWL. These differences are already reflected in the buried fractions
of the titratable residues and in their microenvironments, which can
show variations that, in some cases, are large enough to change the
screening assignments for the calculation of the electrostatic free
energy components.
Oda et al. (1993)