help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Data Supplement
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Georgescu, R. E.
Right arrow Articles by Gunner, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Georgescu, R. E.
Right arrow Articles by Gunner, M. R.

Biophys J, October 2002, p. 1731-1748, Vol. 83, No. 4

Combining Conformational Flexibility and Continuum Electrostatics for Calculating pKas in Proteins

Roxana E. Georgescu, Emil G. Alexov, and Marilyn R. Gunner

Department of Physics, City College of New York, New York 10031 USA


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSIONS
REFERENCES

Protein stability and function relies on residues being in their appropriate ionization states at physiological pH. In situ residue pKas also provides a sensitive measure of the local protein environment. Multiconformation continuum electrostatics (MCCE) combines continuum electrostatics and molecular mechanics force fields in Monte Carlo sampling to simultaneously calculate side chain ionization and conformation. The response of protein to charges is incorporated both in the protein dielectric constant (varepsilon prot) of four and by explicit conformational changes. The pKa of 166 residues in 12 proteins was determined. The root mean square error is 0.83 pH units, and >90% have errors of <1 pH units whereas only 3% have errors >2 pH units. Similar results are found with crystal and solution structures, showing that the method's explicit conformational sampling reduces sensitivity to the initial structure. The outcome also changes little with protein dielectric constant (varepsilon prot 4-20). Multiconformation continuum electrostatics titrations show coupling of conformational flexibility and changes in ionization state. Examples are provided where ionizable side chain position (protein G), Asn orientation (lysozyme), His tautomer distribution (RNase A), and phosphate ion binding (RNase A and H) change with pH. Disallowing these motions changes the calculated pKa.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSIONS
REFERENCES

Protein stability and function depend on interactions among charged and dipolar groups. All proteins have ionized residues, many bind charged substrates, and changes in charge distribution often occur in the active site with reaction. Ionization state changes determine the pH dependence of protein function (Mellor et al., 1993; Dyson et al., 1997; Alexov et al., 2000) and stability (Hendsch et al., 1996; Spassov et al., 1997; Elcock and McCammon, 1998). Proton transfers between buried residues generate transmembrane proton gradients (Ferguson-Miller and Babcock, 1996; Alexov and Gunner, 1999; Luecke et al., 1999; Sham et al., 1999). Surface-charged residue distribution is critical for protein-protein association (Sheinerman et al., 2000). Computational tools that improve estimates of the energetics of the charge distribution within proteins will thus help us understand how protein function is derived from structure.

The analysis of protein charges and dipoles starts with water as a reference solvent (Warshel and Russell, 1984; Gilson and Honig, 1988; Gunner and Alexov, 2000). Water dipoles, being both polar and polarizable, reorient when a charge is added. Ions and dipoles have favorable interactions with water with its high dielectric constant, which are diminished in protein. In contrast, proteins are highly polar but not very polarizable molecules. Each backbone amide has a dipole greater than water, and more than 50% of all side chains are polar or ionizable. However, proteins have limited, nonuniform flexibility producing a heterogeneous response to charges.

The equilibrium distribution of side chain ionization states and position represents the balance of many interactions and so is hard to calculate. Methods to calculate the free energies of protein charge distributions are often tested by their ability to reproduce experimental pKas (Bashford and Karplus, 1990; Beroza et al., 1991; Takahashi et al., 1992; Yang and Honig, 1993; Antosiewicz et al., 1994, 1996; Demchuk and Wade, 1996; Alexov and Gunner, 1997; Sham et al., 1997; Havranek and Harbury, 1999; Mehler and Guarnieri, 1999; Nielsen and Vriend, 2001). These benchmark calculations are not ideal because they include many values for surface residues, which have little interaction with protein. Also, many pKas are not at pHs where the protein is functional or the structure determined, so calculations must also model pH-dependent structural perturbations. However, with more than 100 measured pKas, calculations can try to match many values with one parameter set and methodology. As reliable computational methods become available, proteins where electrostatic forces play important functional roles can be analyzed (Alexov and Gunner, 1999; Demchuk et al., 2000; Dillet et al., 2000).

The Poisson-Boltzmann equation of continuum electrostatics provides a powerful framework for the analysis of electrostatic interactions in proteins (Warwicker and Watson, 1982; Gilson et al., 1985). These calculations take the distribution of dielectric constants, fixed charges, and mobile ions and return the electrostatic potential. Partial charges are placed at atoms in the protein structure. Water, with a dielectric constant of 80, is the major source of the reaction field (solvation) energy of protein dipoles and charges. Water also screens interatomic interactions in a manner that depends on the distances between protein atoms and the solvent. The Poisson-Boltzmann formalism incorporates the large impact of water easily and reasonably accurately. However, a central problem is how to assign the protein dielectric constant. The dielectric constant of a dried protein is measured to be near 4 (Takashima and Schwan, 1965). Molecular dynamics simulations show the protein inner core is not very polarizable with an effective dielectric constant near 4, whereas the solvent exposed flexible exterior has a value near 30 (Simonson and Brooks, 1996). Calculations using one, low value for protein (approx 4) yield a poor match to benchmark pKas, but there is significant improvement as varepsilon prot is increased to approx 20 (Antosiewicz et al., 1994, 1996; Demchuk and Wade, 1996).

The dielectric constant measures the response of a medium to changes in charge. In protein calculations the dielectric constant is a parameter that averages many kinds of reorganization so that they need not be included explicitly (Sham et al., 1997, 1998; Simonson, 1998, 2001; Gunner and Alexov, 2000; Schutz and Warshel, 2001). For example, changes in electronic polarization, backbone or side chain position, residue protonation state, and ion binding can occur as the pH is changed or a reaction proceeds. Electronic polarization may be fairly uniform throughout the protein, and the backbone motions may be small when there is no significant structural change. Thus, their contribution to the electrostatic energy may be estimated from an averaged dielectric constant rather than by explicit atomic calculations. However, much of the protein response is dependent on the local environment.

Several methods that remain based on continuum electrostatics add a nonuniform protein response to charges (for reviews, see Schutz and Warshel, 2001; Simonson, 2001). Different dielectric constants have been assigned to different amino acids (Voges and Karshikoff, 1998) or to different regions of the protein (Demchuk and Wade, 1996; Rocchia et al., 2001). Other methods consider explicit protein motions. Early methods for sampling multiple protein conformations either averaged interactions between different side chain atomic positions (You and Bashford, 1995) or averaged the ionization states calculated with different structures (Ripoll et al., 1996; Zhou and Vijayakumar, 1997). Hybrid methods simultaneously calculate ionization states of acidic and basic residues and the conformation states of surface side chains (Beroza and Case, 1996), hydroxyls and waters (Alexov and Gunner, 1997), and polar and ionizable side chains (Alexov and Gunner, 1999).

Other approaches include: adding explicit protein relaxation into the protein dipole Langevin dipole methodology with a linear response approximation (Sham et al., 1997, 1998); the use of a nonuniform response parameterized on fragment hydrophobic constants, which implicitly draws on the relationship between a group's polarity and its ability to partition into water (Mehler and Guarnieri, 1999); as well as an iterative mobile cluster approach for calculation of multiple-site ligand binding to flexible macromolecules (Spassov and Bashford, 1999).

The multiconformation continuum electrostatic (MCCE) method allows multiple positions of hydroxyl and water protons, side chain rotamers, and ligands in the calculation of the pH dependence of the ionization equilibria of titratable groups (Alexov and Gunner, 1997, 1999). An analysis of the pH titrations of 166 residues in 12 proteins is reported here. The benchmark root mean square errors of these titrations are comparable with those found with other approaches (Antosiewicz et al., 1994, 1996; Mehler and Guarnieri, 1999). Because improvement in pKa calculations depends on the protein changing side chain position with pH, MCCE highlights motions that may be important for maintaining protein structure and function.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSIONS
REFERENCES

The MCCE method

MCCE calculates the equilibrium conformation and ionization states of protein side chains, buried waters, ions, and ligands at a defined pH. Preselected choices for atomic positions and ionization states for side chains and ligands are used. Each specified side chain or ligand position and charge is called a conformer. Look-up tables are calculated of electrostatic and nonelectrostatic conformer self- and pair-wise interactions. Protein microstates have one conformer for each side chain, and ligand. Monte Carlo sampling establishes each conformer's probability in the Boltzmann distribution at a given pH or Eh (Alexov and Gunner, 1997, 1999). The method can compare mutated with wild-type proteins (Alexov et al., 2000) or protein with reactant or product bound (Alexov and Gunner, 1999). The MCCE procedure is divided into three stages: 1) selection of conformers (steps 1-3 below); 2) calculation of energy look up tables (steps 4-5); 3) Monte Carlo sampling and calculation of pKas (steps 6-7). Early descriptions of the method can be found in Alexov and Gunner (1997, 1999).

Step 1: generate protein data file, striped of surface waters, complete with all protons except those on hydroxyls or waters

All protons in the original protein coordinate file are stripped off. A translation table brings residue and atom names into compliance with the conventions of the MCCE package. Residues with missing atoms are reported. Every independently ionizable group must have a unique residue name and number so backbone N- and C- terminal atoms are given a residue designator that differs from their side chain. The chain terminus conformer interacts with the terminal side chain. The program Proteus (Samir Lipovaca City College of New York) places protons with no degrees of freedom. Protons with no partial charges, such as methylene protons, are added in their torsion minima. Only buried waters are treated in atomic detail. Water accessibility is determined with the program SURFV (Sridharan et al., 1992; Bharadwaj et al., 1995) and waters with >10% surface exposure are deleted. Ions are also automatically deleted if they are on the protein surface but retained if they are buried.

Step 2: identify residues with strong electrostatic pair-wise interactions

One DelPhi (Nicholls and Honig, 1991) continuum electrostatics calculation for all polar and ionized acidic and basic residues finds the pair-wise interactions between residues as well as the interaction between residues and the backbone (see step 4). Residues with pair-wise interactions greater than ±2.1 Delta pK units (2.9 kcal/mol) will be provided with additional conformers from the heavy atom rotamer library (Dunbrack and Karplus, 1994; Dunbrack and Cohen, 1997).

Step 3: generate composite protein structure

Conformers must be generated that could be at low energy over the entire range of the calculation before knowing the position or ionization state of other groups. However, if conformers that make good hydrogen bonds to ionized sites are unavailable, the calculated ionization free energy will be wrong. Inside the protein, extra, bad conformers increase the size of the calculation. However, because they are not selected in Monte Carlo sampling, they do not affect the outcome. On the protein surface, extra conformers change the protein-solvent boundary, introducing errors into the electrostatic energy terms. Therefore, conformer preselection tries to generate as many good choices as possible without including positions that will never be found in low energy microstates.

Side chains identified as making strong pair-wise interactions in step 2 are replicated on the backbone in rotamer libraries positions (Dunbrack and Karplus, 1994; Dunbrack and Cohen, 1997). New side chains with atoms closer than 2.0 Å to fixed atoms (backbone and nonpolar side chains) are deleted. If all rotamers clash with the rigid part of the protein, additional conformers are made by 30° rotations around the final bond.

The working protein data file contains a single copy of the atomic positions for the backbone and side chains with no degrees of freedom. It also includes complete copies of each side chain conformer. The following conformers are generated for the original side chain as well as for each added library rotamer.

Each Arg and Lys gets one ionized and one neutral conformer. Each His gets two ionized and four neutral conformers. Two basic conformers are formed by interchange of atoms, equivalent to rotation around the CBCG bond (see Fig. 6B). Each of these has two neutral forms in which ND1 or NE2 is unprotonated and one ionized form. Each Asn and Gln has two conformers with the terminal O and N interchanged representing 180° rotation of OCN. Each Asp, Glu, and Tyr gets one ionized and two neutral conformers with the proton in the torsion minimum. Each Ser and Thr has three conformers with a proton in each torsion minima. Hydroxyl-containing residues (Ser, Thr, neutral Asp, Glu, and Tyr) have extra conformers that can hydrogen bond with all nearby acceptors. Water gets conformers to make and accept available hydrogen bonds plus a conformer with no interaction with the protein, representing water in bulk solvent.

Step 4: electrostatic self- (reaction field) and pair-wise interactions

As described in Alexov and Gunner (1997), one DelPhi (Nicholls and Honig, 1991) calculation is carried out for each conformer. Parse partial charges and radii (Sitkoff et al., 1994), a water probe radius of 1.4 Å, 0.15 M salt, and a 2.0-Å Stern ion-exclusion radius are used. Focusing yields a resolution better than 2.13 grids/Å (Gilson et al., 1987). Each DelPhi calculation has partial charges on atoms of only one conformer. Net ionized residue charges are ±1, and they are 0 on polar, or neutral, ionizable confomers. The net charge for phosphate is -2 and +2 for calcium and magnesium. The potential (Psi ) is collected at atoms of all other conformers. The pair-wise electrostatic interaction energy between the designated conformer (j), which is the source of the potential and the distal conformer (k) is
&Dgr;G<SUP><UP>el</UP></SUP><SUB><UP>jk</UP></SUB>=<LIM><OP>∑</OP><LL><UP>k′=1</UP></LL><UL><UP>atoms.in.conf.k</UP></UL></LIM> &PSgr;<SUB>(<UP>j→k′</UP>)</SUB>q<SUB><UP>k′</UP></SUB> (1)
in which Psi jright-arrow k' is the potential at atom k' in conformer k from partial charges on conformer j. The partial charge, qk', that would be on k' in the ionization state of conformer k. In the same DelPhi run, the interaction of the conformer j with polar backbone and side chains without conformers is collected in a single interaction energy (Delta Gpol,j).

One approximation is that all conformers are combined in the protein structure adding to the dielectric boundary. Now only N DelPhi calculations are needed. In contrast, N2 calculations would be needed if the right conformers for each pair-wise interaction were used. However, even this would not provide the correct position for all other conformers in a given microstate. The small proteins studied here are the worst case for errors caused by extra surface conformers. In contrast, buried sites in large proteins will be essentially undisturbed by moderate changes in protein surface.

Several steps reduce the errors in dielectric boundary introduced by extra conformers. For the pair-wise interactions, only the active conformer of the residue with partial charges is included (Fig. 1 A). However, other residues must have all conformers present so all pair-wise interactions can be determined in a single calculation. The theoretically symmetrical Delta G<UP><SUB>ij</SUB><SUP>el</SUP></UP> and Delta G<UP><SUB>ji</SUB><SUP>el</SUP></UP> are calculated with different conformers removed and these pair-wise interactions are averaged. In addition, the reaction field energy (Rocchia et al., 2001) of each conformer is determined in a second series of DelPhi calculations with all residues in their original position except for the conformer with partial charges (Fig. 1 B).



View larger version (15K):
[in this window]
[in a new window]
 
FIGURE 1   Dielectric boundary used for calculation of (A) pair-wise interactions and (B) reaction field energy. A2 is the only conformer with partial charges in the DelPhi calculation. Speckled region at varepsilon prot = 4. Conformers within dashed lines and the surrounding area at varepsilon  = 80.

Some residues are more sensitive to having their electrostatic interactions improperly calculated because of the low-dielectric material added by the additional conformers (Fig. 1). These can be identified by comparing conformer reaction field energies in protein prepared for calculating pair-wise interactions (Fig. 1 A) with those found with boundary used for reaction field energies (Fig. 1 B). Approximately 15% of the ionized conformers have differences between these two values of more than 1 Delta pK unit. However, these residues are not found to have larger errors in calculated pKa.

Step 5: self- (torsion) and pair-wise (Lennard-Jones) nonelectrostatic energies for all conformers

The proton torsion energy is nonzero only for protons placed out of a torsion minimum to make hydrogen bonds. Neutral Tyr has two energy minima in the plane of the Tyr ring with a 2.59 Delta pK unit (3.5 kcal/mol) barrier between them, so E(phi) = -3.5/2 cos(2phi) kcal/mol in which phi is the torsion angle. Ser and Thr have three degenerate low energy positions with E(phi) = -1.3/2 cos 3(phi - 60°) kcal/mol. The energy of a neutral Asp and Glu hydroxyl proton is the sum of a torsion potential with minima at 0° and 180° and a 0.96 Delta pK unit barrier and a -2.0 Delta pK unit electrostatic attraction between hydroxyl proton and nonbonded carboxylate oxygen when both are in plane so E(phi) = -((2.7/2) cos(phi) + (1.3/2) cos(2phi))kcal/mol. This results in a -2.00 Delta pK unit preference for the syn (0°) over anti (180°) conformation, comparable with the value found by solid-state nuclear magnetic resonance (NMR) measurements (Gu et al., 1996). As only library rotamers are used for heavy atom conformers, these are near torsion minima and their torsion energies are set to zero.

The pair-wise Lennard-Jones interactions between all conformers are calculated and added to the pair-wise electrostatic interactions. Lennard-Jones interactions with portions of the protein that are held rigid are added to the pair-wise polar interaction for each conformer. The A and B constants are found in SM XI.

Step 6: Monte Carlo sampling to determine the Boltzmann averaged conformer occupancy

A given protein microstate is a combination of one conformer for each residue, cofactor, and water. The energy of microstate n (Delta Gn) is
&Dgr;G<SUB><UP>n</UP></SUB>=<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>M</UP></UL></LIM> &dgr;<SUB><UP>n</UP></SUB>(i){&ggr;(i)[2.3k<SUB><UP>b</UP></SUB>T(<UP>pH</UP>−<UP>pK<SUB>sol,i</SUB></UP>)]

+(&Dgr;&Dgr;G<SUB><UP>rxn,i</UP></SUB>+&Dgr;G<SUB><UP>pol,i</UP></SUB>+&Dgr;G<SUB><UP>nonel,i</UP></SUB>)}

+<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>M</UP></UL></LIM> &dgr;<SUB><UP>n</UP></SUB>(i) <LIM><OP>∑</OP><LL><UP>j=i+1</UP></LL><UL><UP>M</UP></UL></LIM> &dgr;<SUB><UP>n</UP></SUB>(j)[&Dgr;G<SUP><UP>el</UP></SUP><SUB><UP>ij</UP></SUB>+&Dgr;G<SUP><UP>nonel</UP></SUP><SUB><UP>ij</UP></SUB>] (2)
in which kbT is 0.43 Delta pK units (0.59 kcal/mol), M is the number of conformers, delta n(i) is 1 for conformers that are present in the state and 0 for all others. gamma (i) is 1 for bases, -1 for acids, and 0 for neutral conformers (polar groups, waters, neutral acids, and bases). pKsol,i is the solution pKa of ionizable group i, Delta Delta Grxn,i is the difference between the reaction field energy for this residue type in solution (SM XII) and that found in the protein (Nicholls and Honig, 1991); Delta Gpol,i and Delta Gnonel,i are the electrostatic and Lennard Jones interactions between conformer i and backbone and side chains with no conformational degrees of freedom; Delta G<UP><SUB><IT>ij</IT></SUB><SUP><IT>non-el</IT></SUP></UP> and Delta G<UP><SUB><IT>ij</IT></SUB><SUP><IT>el</IT></SUP></UP> are pair-wise interactions between conformer i and j. The limits on the summation of the pair-wise interconformer terms ensure that each interaction is counted only once.

pKsol,i is obtained from studies of peptides (Richarz and Wüthrich, 1975; Matthew et al., 1985). Asp and Glu use a carboxylic acid reference pKa of 4.75. Studies have found that the backbone lowers the pKa of these residues even in peptides (Oliveberg et al., 1995; Gunner et al., 2000). Using a simple carboxylic acid reference pKa avoids double counting the impact of the backbone in peptide and protein.

The solution reaction field energies of ionizable groups used with protein dielectric constants 4, 8, and 20 are given in SM XII. These parameters were adjusted so that the average pKa error for each amino acid type is close to zero.

Several techniques aid Monte Carlo convergence. 1) The number of conformers per residue ranges from 2 to more than 20. Sampling first selects a residue and then chooses one conformer to change. Thus, all residues make similar numbers of changes but in residues with many conformers, each is sampled fewer times. 2) On 50% of the steps, a second residue changes conformation. This is selected randomly from residues that interact by at least 1 Delta pK unit with the first. The new microstate with two changes is then subjected to Metropolis sampling (Beroza et al., 1991). 3) After a predetermined number of steps, conformers with no occupancy are deleted. Those with 100% occupancy are fixed on and their pair-wise and self-energy terms are added to the fixed, polar energy terms. This can reduce the size of the calculation by as much as 70%. 4) The conformer occupancies are saved after a stage of conformer elimination. Monte Carlo sampling is reinitiated with a new seed. If the changes in occupancy after the new cycle are less than a preset limit, the program exits.

Step 7: calculate residue pKa and n values from conformer occupancies as a function of pH

Monte Carlo sampling is performed from pH 0 to 14 in steps of 1 pH unit and the pKa of each titrating group is obtained from the best nonlinear, least squares fit to
⟨<UP>Occ<SUB>i</SUB></UP>⟩=<FR><NU>10<SUP>&ggr;<SUB><UP>i</UP></SUB><UP>n<SUB>i</SUB></UP>(<UP>pH−pK<SUB>i</SUB></UP>)</SUP></NU><DE>1+10<SUP><UP>&ggr;<SUB>i</SUB>n<SUB>i</SUB></UP>(<UP>pH−pK<SUB>i</SUB></UP>)</SUP></DE></FR> (3)
in which i is the occupancy of the ionized form of residue i at a given pH, ni is the Hill coefficient reflecting the degree of cooperatively between the sites (Cantor, 1980), and gamma (i) is the charge, -1 for an acidic or 1 for a basic residues.

Timing of the MCCE calculations

The analysis of Bovine pancreatic trypsin inhibitor (BPTI) (58 residues and 173 conformations) takes 3 h, whereas RNase H (155 residues and 608 conformers) takes 8 h on a single processor of an R10,000, 180 MHz SGI Origin 200. Most the time is spent in the DelPhi calculations. Monte Carlo sampling to determine all conformer occupancies at 15 pHs takes 20 min for BPTI and 1.5 h for RNase H.

Single conformer continuum electrostatics (SCCE) calculations

SCCE and MCCE calculations use the same protein structures with the same parameters for atomic charge and radius. In SCCE, waters are deleted and hydroxyl protons are placed in the most occupied MCCE position at pH 7. The most populated form of the neutral acidic residue at low pH is the only available neutral conformer. varepsilon prot is 4.

Atomic coordinates and experimental pKa values

MCCE was used to calculate pKas in 12 proteins (Table 1). The 166 measured values are from 1H -13C heteronuclear resonance experiments unless otherwise specified. The experimental and calculated pKa for each residue are reported in the supplementary material (SM I-X).


                              
View this table:
[in this window]
[in a new window]
 
TABLE 1   Number of polar and charged residues

Barnase (SM-I) uses crystal structures 1A2P (Martin et al., 1999), 1B20 (Buckle et al., 1993), and the 20 NMR model structures (1BNR) (Bycroft et al., 1991). Experimental values were measured at low ionic strength (Oliveberg et al., 1995). The pKa of His-18 was obtained from fluorescence measurements (Loewenthal et al., 1993).

BPTI (SM-II) uses crystal structure (4PTI) (Marquart et al., 1983) and 20 NMR structures (1PIT) (Berndt et al., 1992). pKa from NMR experiments performed at 41°C were corrected to 25°C using the method described in March et al. (1982).

Intestinal bovine calcium-binding protein (calbindin, CabD) (SM-III) consists of crystal structure in presence of Ca2+ (3ICB) (Szebenyi and Moffat, 1986), 24 NMR structures for the holo-protein (1CDN) (Akke et al., 1995), and 33 NMR structures for the apoprotein (1CLB) (Skelton et al., 1995). pKas of Lys with and without calcium is from Kesvatera et al. (1996). The acidic pKas in the apoenzyme are from Kesvatera et al. (1999).

Rat T-lymphocyte adhesion glycoprotein (CD2) (SM-IV) uses crystal structure 1HNG (Jones et al., 1992). pKas of 14 acidic residues determined at low (0.1 mM) and high (1.2 mM) salt concentration were averaged (Chen et al., 2000).

Hen egg-white lysozyme (HEWL) (SM-V) uses triclinic crystal structures (2LZT) (Ramanadham et al., 1981), 1B0D (Vaney et al., 1996), tetragonal crystal structure 1HEL (Wilson et al., 1992), 50 NMR structures (1E8L) (Schwalbe et al., 2001), and the averaged NMR structure (1HWA) (Smith et al., 1993). pKas of basic residues are from Kuramitsu and Hamaguchi (1980), and values for acidic residues at 100 mM ionic strength are from Bartik et al. (1994).

Turkey ovomucoid inhibitor domain 3 (OMTKY3) (SM-VI) consists of OMTKY3 complexed with human leukocyte elastase (1PPF) (Bode et al., 1986), Streptomyces griseus proteinase B (3SGB) (Read et al., 1983), or alpha -chymotrypsin (1CHO) (Fujinaga et al., 1987). The proteases are removed. There are 50 NMR structures (1OMU) (Hoogstraten et al., 1995). Acidic residue pKas are measured at low (10 mM) and high ionic strength (1 M) (Schaller and Roberston, 1995). Basic residue pKas are measured at 15 mM and 500 mM ionic strengths (Forsyth et al., 1998).

B1-immunoglobulin G-binding domain of protein G (ProtG) (SM-VII) uses crystal structure (1PGA) (Gallagher et al., 1994), 60 NMR structures (1GB1) (Gronenborn et al., 1991), and minimized NMR structure (2GB1) (Gronenborn et al., 1991). pKas at 100 mM ionic strength are from Khare et al. (1997).

Ribonuclease A (RNase A) (SM-VIII) uses the crystal structure with sulfate (3RN3) (Howlin et al., 1989), ion free protein (7RSA) (Wlodawer et al., 1988), and 32 NMR structures (2AAS) (Santoro et al., 1993). pKas at 200 mM NaCl (Rico et al., 1991; #4211). pKas of His-12, -48, -105, and -119 in the absence of phosphate are an average of values from Ruterjans and Witzel (1969), Matthews and Westmoreland (1973), Walters and Allerhand (1980), and Quirk and Raines (1999); His-112 and His-119 pKas determined at 20 mM and 100 mM phosphate (Meadows et al., 1969; Cohen et al., 1973). Sulfate is treated as phosphate.

Ribonuclease Hi (RNase H) (SM-IX) uses crystal structures 2RN2 (Katayanagi et al., 1992) and 1RNH (Yang et al., 1990) and Mg2+ containing form (1RDD) (Katayanagi et al., 1993) and 8 NMR derived structures (1RCH) (Yamazaki et al., 1997). pKas are from Oda et al. (1993, 1994).

Ribonuclease T1 (RNase T) (SM-X) uses the crystal structure with vanadate (3RNT) (Kostrewa et al., 1989). Vanadate was treated as phosphate. Measured pKas are from Inagaki et al. (1981) and McNutt et al. (1990).

Human Immunodeficiency Virus Type-1 Protease (HIV-1) uses the crystal structure (1HIV) (Thanki et al., 1992) with dihydroethylene-containing inhibitor, which is removed. pKas are of the aspartyl dyad from inhibitor free enzyme kinetic studies (Hyland et al., 1991; Ido et al., 1991).

For residues in Barnase, OMTKY3, BPTI, Calbindin, and RNaseHi, there is more than one pKa reported. The value used is either an average or the pKa of the site closest to the atom losing the proton. Overall, the uncertainties are small (<=  0.3 pK units). The n values (Eq. 3) were estimated from the published titration curves.

Even in these well-characterized proteins, only 166 of the 374 ionizable residues (Asp, Glu, His, Tyr, Lys, and Arg) have measured pKas (Table 1). There are no Arg pKas reported. Residues titrating in the same pH region influence each other. Thus, errors in the titration of Arg may influence Lys or Tyr. Likewise, errors in titration of missing Asp or Glu can impact the titration of the reported residues.

The following have incomplete titration curves that miss the plateau at low or high pH: Lys-34, Asp-7, and Asp-27 in OMTKY3, Lys-25 in CabD, Tyr-3, Tyr-33, Tyr-45, and Lys-13 in ProtG, Asp-93 and Glu-73 in Barnase, Tyr-23 in BPTI, Asp-48 and Asp-66 in HEWL, and His-114 in RNase H. Generally titration curves justify the reported values. There are eight residues where only a limiting value is available. This is taken as the true pKa, which may overstate the error of the calculations (see supplementary material).


    RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
CONCLUSIONS
REFERENCES

Comparison of MCCE calculated and measured pKas

The first pKa calculations using detailed protein structures and energies derived from continuum electrostatics held the protein rigid, allowed only changes in residue ionization states, and used a low protein dielectric constant (varepsilon prot = 4) (Bashford and Karplus, 1990; Lim et al., 1991; Yang et al., 1993; Honig and Nicholls, 1995). SCCE does not match experimentally determined pKas very well (Antosiewicz et al., 1994; Demchuk and Wade, 1996). However, SCCE provides good benchmarks for comparing different methods. SCCE root mean square (RMS) errors at varepsilon prot of 4 (Table 2) are similar to those found previously. MCCE and SCCE calculations with varepsilon prot = 4 are compared in Fig. 2 and Table 2. Experimental pKas for each residue and values calculated with different Protein Data Bank files and varepsilon prot values are in the supplementary material (SM-I-X). For the 166 residues, the average absolute error decreases from 0.46 (SCCE) to 0.11 pH units (MCCE). The RMS error is reduced from 2.29 to 0.81 pH units. Less than 10% of the calculated pKas differ from experiment by more than 1.5 pH units. With SCCE 40% of the pKas are in error by this amount and 7% have errors greater than 4 pH units (Table 2).


                              
View this table:
[in this window]
[in a new window]
 
TABLE 2   Errors in calculated pKas



View larger version (17K):
[in this window]
[in a new window]
 
FIGURE 2   Calculated versus measured pKas for (a) single conformer (SCCE) and (b) multiconformer (MCCE) continuum electrostatic calculations. varepsilon prot = 4. Asp (), Glu (), Ctr (), His (black-diamond ), Ntr (), Tyr (black-triangle), and Lys (open circle ).

Eighty-one residues are on the surface with no special interactions. However, 35 experimental pKas are shifted by >1.0 Delta pK unit from isolated amino acids (Table 3). Thirty-six residues are buried and have lost >3 Delta pK units (4.08 kcal/mol) reaction field (solvation) energy. Fifty-three have ion pair interactions of >3 Delta pK units. Thus, 85 residues are in one or more perturbed categories with many in more than one class. Surface, noninteracting residues are easiest to model. Their SCCE RMS error is only 1.57, and the MCCE error is 0.67 pH units. Of the unperturbed residues His have the largest RMS error because of His-124 in RNAse H (see below). Perturbed residues have larger errors with SCCE (RMS error 2.58 pH units). MCCE provides significant improvement (RMS error 1.09), but these residues pose the greatest challenge for calculation (Table 3). The largest errors come from residues in strong interactions (RMS error 1.17) as do some of the greatest improvements. One difficulty is that many ion pairs involve Arg for which there are no measured pKas.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 3   Experimental and calculated pKas for different residue types

pKas calculated with different structures of the same protein

Calculations should provide values that are the same for all nominally identical structures of the same protein. The average standard deviation with SCCE of the 32 residues in lysozyme structures (2LZT and 1B0D) is 1.4 pH units and only 0.5 pH units with MCCE (SM-V). Thus, MCCE conformational flexibility greatly diminishes the dependence on the initial structure. In addition, the RMS error for the 19 residues with known pKas is cut in half.

The pKas derived from crystal and solution structures were compared for 126 residues in eight proteins (Table 4). MCCE uses a rigid backbone and nonpolar side chains. The sampling of conformational space is increased in an ensemble of NMR structures, where each has somewhat different backbone and side chain positions. Thus, the average RMS variation of residue pKas is 1.5 pH units when different NMR coordinate files are compared. However, the RMS error of the averaged pKas is 0.83 pH units, smaller than for the crystal structures (0.94). Similar improvement was found in single conformer studies with varepsilon prot of 20 (Antosiewicz et al., 1996). Calculations with single, deposited average NMR structures have slightly larger errors (RMS error 1.02 pH units). However, the improvement with analysis of ensembles of NMR structures comes at the expense of more calculations. Here, 271 NMR structures were analyzed for the comparison of eight proteins.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 4   MCCE calculations with NMR and x-ray derived structures

Comparison of experimental and calculated n values

The Hill coefficient, n, in Eq. 3 measures how much residue titrations influence each other. Neighboring residues titrating in the same pH range have smaller n values. Experimental n values were obtained for 94 residues in CabD, Barnase, OMTKY3, CD2d1, and RNaseH, either from the published values or from the shape of published titration curves. Even in these small proteins, half of the residues have n values smaller than 0.85, whereas only two have values significantly larger than 1. The MCCE derived values (Eq. 3) are well correlated with those determined experimentally (Table 5).


                              
View this table:
[in this window]
[in a new window]
 
TABLE 5   Distribution of Hill coefficients (n) values

The correspondence between calculated titration curves and MCCE site occupancies become worse as n decreases. This is because Eq. 3 provides only an approximation for the behavior of individual residues, missing the complex pH dependence of clusters of interacting residues (Onufriev et al., 2001).

Calculation of water and ion binding

Solvent exposed waters are removed, whereas buried waters are treated in the same detail as side chains with conformations allowing rotation about the oxygen. Each water has an extra conformer in bulk solvent with no interactions with the protein. The reaction field energy of free, solvated water controls the occupancy of interior sites. The water/hexane partition coefficient suggests a -3.56 Delta pK units (-4.84 kcal/mol) reaction field energy for an isolated water in bulk water (Wolfenden and Radzicka, 1994). For water to be bound, the remaining reaction field energy plus added interactions with the protein must be more favorable than this. There are 38 buried, crystallographic waters (Table 1). With a solution reaction field energy of -3.56 Delta pK units, the average water site occupancy is only 33% (pH 7). The fraction increases to 77% as the penalty for removing the water from bulk was decreased to 0.87 Delta pK units. Similar results were found in earlier MCCE calculations (Alexov and Gunner, 1999). In the group of proteins studied here only Asp-66 in HEWL has a pKa that depends on bound water (see below). The indifference to water in this data set can be seen in the similarity of the pKas derived from crystal structures and those using NMR structures where no waters are included. However, in other systems, bound water can have a large impact on the stability of buried charges (Gibas and Subramaniam, 1996; Garcia-Moreno et al., 1997; Alexov and Gunner, 1999).

Analysis of salt-bridges and corrections for strong electrostatic interactions

It is often difficult to obtain the correct pKas for residues in ion pairs (Sindelar et al., 1998). Continuum electrostatics calculations often over-stabilize the ionization of charges, lowering acid pKas and raising those of bases. MCCE with no correction yields pKas for the 98 acids -0.59 pH units too low and for the 43 bases 0.54 pH units too high. An empirical function (SOFT) (Eqs. 4a and 4b, Fig. 3) was introduced to weaken strong pair-wise electrostatic interactions with other conformers (Delta G<UP><SUB>ij</SUB><SUP>el</SUP></UP>) and with backbone and nonconformer containing side chains (Delta Gpol, i).
&Dgr;G<SUP><UP>el</UP></SUP><SUB><UP>ij</UP></SUB>(<UP>SOFT</UP>)=<FR><NU>&Dgr;G<SUP><UP>el</UP></SUP><SUB><UP>ij</UP></SUB></NU><DE>kT×<FENCE>1+<FR><NU>e<SUP>(<UP>Abs</UP>(<UP>&Dgr;G</UP><SUP><UP>el</UP></SUP><SUB><UP>ij</UP></SUB><UP>/kT</UP></SUP>)<UP>−5.5</UP>)</NU><DE><UP>1+</UP>e<SUP>(Abs(&Dgr;G<SUP><UP>el</UP></SUP><SUB><UP>ij</UP></SUB><UP>/kT</UP>)−5.5</SUP>)</DE></FR></FENCE></DE></FR> (4a)

&Dgr;G<SUB><UP>pol,i</UP></SUB>(<UP>SOFT</UP>)=<FR><NU>&Dgr;G<SUB><UP>pol,i</UP></SUB></NU><DE><UP>kT×</UP><FENCE><UP>1+1.5×</UP><FR><NU><UP>e</UP><SUP>(<UP>Abs</UP>(<UP>&Dgr;G<SUB>pol,i</SUB>/kT</UP></SUP>)−5.5)</NU><DE>1+e<SUP>(Abs(&Dgr;G<SUB><UP>pol,i</UP></SUB><UP>/kT</UP></SUP>)−5.5)</DE></FR></FENCE></DE></FR> (4b)
With full strength interactions the global RMS error is 1.68 pH units (Tables 2 and 6), whereas it is 0.86 with the SOFT function. The average errors for the acids and bases are now -0.16 and 0.07 pH units, respectively. This function has little effect on the pKa of most groups, but for 20 residues the error is decreased by >1.5 Delta pK units (Fig. 4).



View larger version (17K):
[in this window]
[in a new window]
 
FIGURE 3   SOFT function applied to reduce strong pair-wise () conformer-conformer (Eq. 4a); (open circle ) conformer-polar group (Eq. 4b) electrostatic interactions. Dashed line, initial Delta G<UP><SUB>ij</SUB><SUP>el</SUP></UP>; dotted line, energy halved.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 6   Accuracy of pKa calculations with and without the SOFT function



View larger version (20K):
[in this window]
[in a new window]
 
FIGURE 4   Comparison of the error in the pKas calculated with and without the SOFT function (Eq. 4). (open circle ) acids; () bases. Along heavy line pKas are unchanged. Residues in the shaded region have larger errors with the SOFT function. Those in the white region have their errors reduced.

The SOFT function can be viewed as increasing the effective dielectric constant but only for relatively close neighbors (Fig. 3). The largest change halves the interaction energy, equivalent to an varepsilon prot of 8. There are several reasons that this may be necessary. Interaction energies are very sensitive to small changes in position when groups are close together. If structures report the closest allowable separation for salt-bridges as the time-averaged position then the interactions will be too large. Small excursions, weakening electrostatic interactions are not taken into account. In addition, MCCE uses Lennard-Jones parameters (SM-XI) that yield good values for hydrogen bonds. These may underestimate the cost of bringing ion pairs together, overestimating the electrostatic interactions. Lastly, the available conformers may not be optimal for stabilizing the charge-dipole state of the ion pair at low and high pH. Thus, the SOFT function represents a pragmatic solution to the problem of obtaining good values for ion pairs. However, future implementations of MCCE will aim to use enhanced local conformer flexibility to render this function unnecessary.

MCCE calculations with different protein dielectric constants

Residue pKas were calculated at varepsilon prot of 4, 8, and 20 (Tables 2 and 3). With SCCE, there is marked improvement as varepsilon prot is increased (Antosiewicz et al., 1996; Demchuk and Wade, 1996). However, MCCE calculations are only moderately sensitive to varepsilon prot. The RMS error at varepsilon prot = 8 (0.69) is slightly smaller than with varepsilon prot = 4 (0.84) or varepsilon prot = 20 (0.70). Here, site titration is stabilized by either explicit conformation change at small varepsilon prot or by the averaged dielectric response as varepsilon prot increases. Fewer conformational changes are seen as varepsilon prot is increased.

With larger varepsilon prot, buried charges have a smaller Delta Delta Grxn and smaller pair-wise interactions. Because most pair-wise interactions are favorable, changes in these two terms tend to offset each other. However, explicit conformational flexibility can stabilize a charge more than varepsilon prot of 20. For example, the MCCE pKa of Asp-7 in OMTKY3 increases from 2.8, the correct value, to 5.0 as varepsilon prot increases from 4 to 20. In other cases the ionized form is too stable with varepsilon prot of 20 indicating the high dielectric constant either overestimates the local protein flexibility or weakens unfavorable pair-wise interactions.

Conformational flexibility in the MCCE calculations

In the proteins used here, 55% of the residues have multiple conformers either because they are polar (23%) or ionizable (32%). Overall 62% of the available conformers have some occupancy and 48% are occupied by more than 10% during the titration. Focusing on changes in polar residues (Asn, Gln, Ser, Thr, and waters), 17% do not change conformation between pH 0 and 14, 65% show small changes of 1% to 20%, whereas only 4% change occupancy by >50% (Fig. 5). Most motions are hydroxyl reorientation or rotation of Asn or Gln, which would not be seen by x-ray crystallography. However, these rotations rearrange hydrogen bond networks and so can have significant impact on calculated pKas (Hooft et al., 1996; Nielsen and Vriend, 2001). In addition, several residues take heavy atom conformations different from that found in the protein data file at some point in the pH titration, and these have significant impact on calculated pKas (see below).



View larger version (23K):
[in this window]
[in a new window]
 
FIGURE 5   Number of polar (Ser, Thr, Asn, Gln) conformers that have changed occupancy by the given percentage as the pH is changed from 0 to 14.

MCCE analysis of individual proteins

The benchmark pKa calculations show the MCCE method yields as good fit to data as other current methods (Antosiewicz et al., 1996; Mehler and Guarnieri, 1999) (see below). The unique advantage of this method is that it follows explicit movement of side chains and waters during a titration and so highlights important conformational changes (Alexov and Gunner, 1999).

Lysozyme

Lysozyme is a glycosyl transfer enzyme (Vocadlo et al., 2001). The cleaved bond is held between Glu-35 and Asp-52. There are 31 ionizable residues and all have measured pKas with the exception of the 11 Arg, 1 Tyr, and the N terminus. Glu-7-Lys-1 and Asp-48-Arg-61 are ion pairs. Glu-35, Asp-52, and Asp-66 are in a cluster of residues with like charge. With the exception of Lys-1, these interacting residues are all buried, with Delta Delta GrxnDelta pK units. Many pKa calculations have been carried out on this protein (see below). The MCCE RMS error for 1BOD is 0.63 pH units. Three residues have errors of approx 1.1 pH units, whereas all other errors are smaller (SM-V).

Polar side chain motions

The active site, Glu-35 has a pKa of 6.2, Asp-52 of 3.7, and Asp-66 of 0.9. The interaction between 35 and 52 is only approx 1 Delta pK unit. The high Glu pKa has been hard to calculate. With SCCE the order of ionization of 35 and 52 are interchanged (Table 7). However, MCCE calculations with several x-ray and NMR structures reproduce the pKas. One determinant is Asn-46. At low pH, conformers with OD1 and ND2 near Asp-52 are both occupied. At high pH, the Asn always has ND2 closer to the ionized Asp stabilizing it by approx 2 Delta pK units. Monte Carlo sampling with the Asn fixed in the conformer found in the crystal structure (OD1 closer to Asp-52) interchange the order of titration of Glu-35 and Asp-52 (Table 7).


                              
View this table:
[in this window]
[in a new window]
 
TABLE 7   Outcome with different methods for pKa calculations: the results for lysozyme

Small motions yield significant changes

When Asn-46 is fixed the pKa of Asp-66 decreases by 1.4 pH units. The difference in interaction with ionized Asp-52 and Glu-66 of approx 0.5 Delta pK unit is too small to explain the change. However, reorientation of Thr-51, Ser-60, Thr-69, two waters, and Arg-68+ all stabilize Asp66- more when Asn-46 is fixed. The Asp-66 pKa also decreases as the water solution reaction field energy is diminished, increasing the occupancy of stabilizing, buried waters.

CD2d1

Rat CD2d1, in the immunoglobulin superfamily, is the N-terminal domain of the T-cell antigen CD2 (Driscoll et al., 1991). It is important for cell surface recognition (McAlister et al., 1996) and transmembrane signaling during T-cell activation (Sunder-Plassmann and Reinherz, 1998). The CD2 ligand-binding surface has a high proportion of charged and polar residues including Glu-28, Glu-29, and Glu-41, Arg-31, and Thr-37 so ionic interactions are thought to be important for binding and specificity. The pKas of 14 of the 55 ionizable residues have been determined (Chen et al., 2000) (SM -IV).

Rotamer changes on the protein surface

Asp-28, Glu-29, and Glu-41 are in a cluster on the binding surface (Fig. 6 A, Table 8). Glu-41, with a pKa of 6.5, helps bind target proteins (van der Merwe et al., 1995; Chen et al., 2000). At low pH, Glu-410, Glu-29- occupy rotamers close to Arg-31+ and Lys-43+ further from Asp-28-. As Glu-41 is ionized, Glu-29- reorients. Fixing Glu-29 in its original rotamer lowers its pKa by 0.7 pH units, whereas that of Glu-41 is raised by 0.4 pH units. This cluster is modeled less well at varepsilon prot = 20. 



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 6   (Top) Position of occupied conformers of interacting residues on the binding surface of rat CD2d1 (1HNG). The order of ionization is Glu-28, Glu-29, and then Glu-41 as the pH is raised. Glu-29 moves from neutral conformer A to ionized position B. The B conformer has more favorable interactions with Lys-43 and Arg-31. At low pH, the neutral Glu-41 is found with the proton at position A (78%) and B (18%). (Bottom) Conformer positions in the active site of RNaseA (3RN3). His-119, His-12, and the doubly ionized PO4 shown. The imidazole nitrogens (ND1 and NE2) in darker gray. His-12* and His-119*, found in the deposited structure, differ from His-12 and His-119 by rotation around the CBCG bond. His-119 positions A and B are included in the deposited structure file. At low pH ionized His-12* predominates (70% occupied) and His-119B* and His-119B are both found. Between pH 3 and 5, a small amount of ionized His-119A is occupied. Both His-119 proton tautomers (ND1 or NE2 protonated) are occupied. PO4 dissociates from the protein as His-12 looses its proton.


                              
View this table:
[in this window]
[in a new window]
 
TABLE 8   pKas of interacting residues in RNaseA and CD2: the impact of ion binding and mutation

Calculation of pKas in mutated proteins

Glu-29 and Glu-41 have been mutated to Gln and pKas measured for nearby residues (Chen et al., 2000). MCCE calculates pKas in mutated proteins by supplying conformers for the replacement residue. These are added to the wild-type structure, so interactions for both wild-type and mutant side chain are in the same energy look-up table. Monte Carlo sampling then allows only appropriate residue conformers (Alexov et al., 2000). Previous calculations have shown that interactions with neutral Asp are comparable with that of Asn (Alexov et al., 2000). The mutation of Glu-29 and Glu-41 in CD2 were therefore modeled with the wild-type energy look-up table where the mutated acid was simply forced to remain in a neutral conformer. If Glu-29 is mutated to Gln, the pKa of Glu-41 is lowered by 1.7 pH units in agreement with the experiment shift of 1.5 pH units (Table 8). However, mutation of Glu-41 to Gln lowers the calculated pKa of 29 by 1.6 pH units in contrast to the measured negligible change.

Prot G

Streptococcal protein G, an immunoglobin binding protein with small (55-75 amino acids) repeating domains (B1, B2, and B3), binds immunoglobin-G constant regions (Tashiro and Montelione, 1995). The pKas of 13 of the 21 ionizable residues have been determined (Khare et al., 1997) and residue ionization studied previously by an SCCE based approach with varepsilon prot of 20 (Khare et al., 1997). Glu-56 and Lys-28 are buried (Delta Delta Grxn > 3). Ion pairs include Glu-27 with both Lys-28 (3.0 Å) and Lys-31 (3.0 Å), Glu-15 with Lys-4 (3.4 Å), Asp-47 with Lys-50 (3.0 Å), and Glu-19 with the N terminus (3.4 Å). Each pair has >3 Delta pK units pair-wise interaction with each other.

Accurate pKas of ion pairs can rely on side chain motions

Protein G has five residues in salt bridges. SCCE at varepsilon prot = 4 for the five residues with known pKas have an RMS error of 3.1 pH units, whereas it is 0.75 with MCCE. In the crystal structure (1PGA) Glu-27 is close to Lys-28 and Lys-31 so the SCCE pKa of Glu-27 is too low and that of Lys-28 too high. In the average NMR structure (1GB1) the Glu is further than 8 Å from either Lys. MCCE provides good pKas for both Lys-28 and Glu-27 starting with the 1PGA structure because a new Glu conformer is selected that is closer to that found by NMR. Changes in conformation reducing ion-pair stabilization had been suggested in previous SCCE based calculations at varepsilon prot = 20, which could not obtain the correct pKa for this Glu (Khare et al., 1997). The accurate pKa for Glu-15 also relies on several rotamers being occupied during its titration.

OMTKY3

The avian ovomucoids are glycoproteins composed of three tandem, homologous structural domains, each acting as a serine protease inhibitor (Laskowski et al., 1987). Structures of the third domain, complexed to different proteases, have been determined. The ionization states of the 15 the 16 ionizable residues in the inhibitor have been measured in the absence of protein target (Schaller and Roberston, 1995; Forsyth et al., 1998). Previous calculations of pKas were made using an SCCE based approach with varepsilon prot = 20 (Forsyth et al., 1998). NMR studies show the binding region in solution is located between Lys-13-Pro-22 and Asn-33-Ala-40 (Song and Markley, 2001). Lys-13 and Lys-34, Glu-19, and Tyr-20 are found at this interface (SM-VI).

Problems calculating pKas of residues in clusters

Lys-13, Lys-34, and Tyr-11 form a triad titrating near the same pH, a motif that is often causes problems. In 1PPF the pKa of Lys-13 and Tyr-11 are both 1.6 pH units too high, whereas Lys-34 is 2.6 pH units too low. The ionized Lys-34 is destabilized by loss of reaction field energy and unfavorable interactions with backbone. A favorable interaction of -1.2 Delta pK units between Lys-130 and Tyr-11- helps stabilize the anionic Tyr. In 3SGB the two Lys pKas are close to experiment, whereas Tyr-11 is >4 pH units too high. The Delta Delta Grxn for Lys-34 and its interaction with backbone are reduced by 3 pH units shifting its pKa up. This Lys takes different conformations when it is neutral and ionized. However, the occupied conformer of Lys-130 has little interaction with the Tyr-11-, which now remains neutral even at pH 14.

RNaseA

Bovine pancreatic RNaseA is a ribonuclease with a well-defined binding cleft containing His-12 and His-119 and Lys-7, Lys-41, and Lys-66. Of the 36 ionizable residues 16 have known pKas (Rico et al., 1991; Quirk and Raines, 1999) and residue ionization has been studied previously by an SCCE based approach with varepsilon prot = 20 (Antosiewicz et al., 1996).

Selection of His tautomers

His-12 and His-119 are functionally important residues. There are two positions for His-119 in 3RN3. Previous studies have shown the calculated pKa depends on the choice of rotomer and neutral His tautomer, which must be preassigned in SCCE based methods (Antosiewicz et al., 1996). With MCCE all rotamers and tautomers are available during Monte Carlo sampling (Fig. 6 B). With this freedom the calculated pKas are 0.7 to 1.6 pH units lower than the experimental value (phosphate free) and relatively independent of the structure analyzed (SM VIII).

Ion binding

His-12 and His-119 bind phosphate increasing the pKa of each by 1 to 1.2 pH units (Table 8) (Cohen et al., 1973; Matthews and Westmoreland, 1973; Walters and Allerhand, 1980). In MCCE ion binding is modeled by providing the phosphate with two conformers. One interacts with the protein, whereas the second isolated conformer has -45.6 Delta pK units (-62 kcal/mol) of reaction field energy. When bound the phosphate looses 9.5 Delta pK units of reaction field energy (3RN3), which must be replaced by favorable interactions, primarily with ionized His-12 and His-119, for the protein bound conformer to be occupied. Although it is possible to dynamically calculate the phosphate ionization state within MCCE, this was not done here. Rather, phosphate is always doubly ionized. Monte Carlo sampling was carried out under three conditions using the energy look-up tables for 3RN3. In one, the phosphate conformer bound to the protein is forced to be occupied, in another the unbound conformer is fixed on, lastly both conformers are allowed and so the ion remains in equilibrium with the protein. Comparing phosphate bound and free, the pKa of His-12 increases by 2.8 and His-119 by 1.0 pH unit (Table 8). The shift is intermediate if phosphate remains in equilibrium with the protein. Here the phosphate dissociates, as His-12 looses its proton. The calculations thus reproduce the order of ionization of 12 and 119 as well as the magnitude of the pH shifts with and without phosphate. The experimental pK shift of 1 to 1.2 pH with added phosphate is closer to that found in the dynamic calculations suggesting that the ion is lost when the two His are neutral.

Unstable calculated pKas for residues in a buried ion pair

His-48 has a measured pKa of 6.3 (Table 3). With MCCE the pKa is 2.5 pH units too high in the crystal structure 3RN3, whereas in NMR structures (2AAS) the average is 1.7 pH units too low with an RMS variance of 4.4 pH units. The variability of His-48 is coupled to changes in Asp-14 with which it has an interaction of approx -5 Delta pK units when both are ionized. NMR structures, which gave very different pKas for these groups, were compared. When the Asp Delta Delta Grxn is moderate and its interaction with the backbone favorable it titrates at low pH and the His remains ionized to very high pH (10 or greater). However, there are structures where the Asp Delta Delta Grxn is >9 Delta pK units so it now stays neutral until pH 10. Here the favorable His+:Asp- pair-wise interactions are never important because the His titrates well before the Asp. Thus, small shifts in Asp Delta