| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Biophys J, February 2002, p. 1096-1111, Vol. 82, No. 2


and
*Division of Bioengineering and Physical Science, Office of
Research Services, and
Laboratory of Molecular Biology,
National Institute of Diabetes and Digestive and Kidney Diseases,
National Institutes of Health, Bethesda, Maryland, 20892 USA,
Department of Biochemistry and Molecular Biology, The
University of Melbourne, Parkville, Australia, and
§Institut für Biophysik, Johann Wolfgang
Goethe-Universität, Frankfurt am Main, Germany
| |
ABSTRACT |
|---|
|
|
|---|
Strategies for the deconvolution of diffusion in the determination of size-distributions from sedimentation velocity experiments were examined and developed. On the basis of four different model systems, we studied the differential apparent sedimentation coefficient distributions by the time-derivative method, g(s*), and by least-squares direct boundary modeling, ls-g*(s), the integral sedimentation coefficient distribution by the van Holde-Weischet method, G(s), and the previously introduced differential distribution of Lamm equation solutions, c(s). It is shown that the least-squares approach ls-g*(s) can be extrapolated to infinite time by considering area divisions analogous to boundary divisions in the van Holde-Weischet method, thus allowing the transformation of interference optical data into an integral sedimentation coefficient distribution G(s). However, despite the model-free approach of G(s), for the systems considered, the direct boundary modeling with a distribution of Lamm equation solutions c(s) exhibited the highest resolution and sensitivity. The c(s) approach requires an estimate for the size-dependent diffusion coefficients D(s), which is usually incorporated in the form of a weight-average frictional ratio of all species, or in the form of prior knowledge of the molar mass of the main species. We studied the influence of the weight-average frictional ratio on the quality of the fit, and found that it is well-determined by the data. As a direct boundary model, the calculated c(s) distribution can be combined with a nonlinear regression to optimize distribution parameters, such as the exact meniscus position, and the weight-average frictional ratio. Although c(s) is computationally the most complex, it has the potential for the highest resolution and sensitivity of the methods described.
| |
INTRODUCTION |
|---|
|
|
|---|
Analyzing the size-distribution of biological or
synthetic macromolecules in solution, for example, the study of the
oligomeric state of proteins, is a very important application of
analytical ultracentrifugation with a long history (Signer and Gross,
1934
; Svedberg and Pedersen, 1940
; Bridgman, 1942
; Baldwin and
Williams, 1950
; Vinograd and Bruner, 1966
; Scholte, 1968
; van Holde and Weischet, 1978
; Stafford, 1992a
). Because of the relatively large size
dependence of macromolecular migration in a gravitational field,
sedimentation velocity studies have the potential for high resolution.
In the last decades, two approaches for the determination of
sedimentation coefficient distributions have become most popular: the
integral sedimentation coefficient distributions
G(s) by van Holde and Weischet (vHW) (1978)
and
the differential apparent sedimentation coefficient distribution
g*(s) obtained as a transform of the time-derivative of the
signal, dc/dt (Stafford, 1992a
). Exploiting the
increased computational capabilities now available, we have recently
proposed new methods for obtaining the apparent differential
sedimentation coefficient distribution g*(s), termed ls-g*(s) (Schuck and Rossmanith, 2000
), and for calculating
a differential sedimentation coefficient distribution
c(s) in which corrections for diffusion are made
(Schuck, 2000
). Both methods are based on direct least-squares modeling
of the sedimentation boundary, using linear combinations of
sedimentation profiles for nondiffusing species or linear combinations
of Lamm equation solutions, respectively. They can be applied to larger
data sets and, by virtue of regularization, exhibit substantially less
noise in the calculated distributions than previous methods. First
applications of the methods indicate a high versatility and significant
advantages in sensitivity and resolution over the classical methods
(Perugini et al., 2000
; Schuck et al., 2000
; Schuck and Rossmanith,
2000
; Cole and Garsky, 2001
; Sedlák and Cölfen, 2001
;
Hatters et al., 2001
). However, a systematic exploration of differences
and relationships among the different approaches, the statistical and
experimental prior knowledge needed, as well as the implicit
assumptions and their practical relevance are not available at present.
One of the classical problems in the theory of ultracentrifugal sedimentation is the treatment of diffusion in size-distribution analysis. The deconvolution of boundary diffusion can significantly increase the amount of detail that can be learned from a sedimentation experiment, and how diffusion is described constitutes a central difference among the existing approaches. One difficulty is that a set of two parameters (such as sedimentation and diffusion coefficient) is required to describe the sedimentation of each species in the distribution. Although the apparent sedimentation coefficient distributions g(s*) and ls-g*(s) do not allow consideration of diffusion, the vHW method for calculating G(s) is designed to take diffusion into account in a model-independent way by extrapolation of boundary fractions to infinite time. In the present paper, we describe a similar model-free method for calculating G(s) by extrapolation of the ls-g*(s) distributions to infinite time. Although complete model independence can be appealing, the extrapolation process can limit the resolution. Further, we will show that this extrapolation strategy for the deconvolution of diffusion fails for heterogeneous mixtures of species with overlapping sedimentation boundaries. The Lamm equation method c(s) uses an intermediate strategy, by estimating size-dependent diffusion coefficient via the Stokes-Einstein and Svedberg relationships and by utilizing prior knowledge, such as the weight-average frictional ratio, the molar mass of a main component, or similar information relating size and shape of the distribution. It also uses maximum entropy regularization, a Bayesian strategy to achieve numerical stability and optimal resolution, which is routinely used in many other fields of physics and biophysics for problems of similar mathematical structure. In contrast to the more classical data transformations, direct boundary models such as c(s) provide a criterion for the goodness-of-fit, which has potential use in nonlinear regression of distribution parameters. So far, however, this has remained unexplored.
In the present communication, we apply the different methods
[g(s*), ls-g*(s),
G(s) by vHW and by extrapolation of
ls-g*(s), and c(s)] to
four different data sets with different broadness of size distribution
and different extent of diffusion. First, we examine a previously
proposed theoretical model system with four species of closely spaced
sedimentation coefficients (Stafford, 1992b
). We then compare the
information obtained from the analysis of experimental data from a
predominantly single species with a trace impurity, a self-associating
protein with several discrete oligomeric states, and a continuous
distribution of large lipid emulsion particles. Besides questions of
sensitivity and of resolution of species that do not exhibit clearly
distinguishable sedimentation boundaries, we also examine the stability
of the c(s) approach with respect to the prior
assumption needed. In particular, we show how the knowledge of the
weight-average frictional ratio can be extracted from the experimental
data by combination with nonlinear regression.
| |
EXPERIMENTAL |
|---|
|
|
|---|
Analytical ultracentrifugation
For sedimentation velocity experiments, a Optima XL-I analytical
ultracentrifuge (Beckman Coulter, Fullerton, CA) with
absorbance and interference optical detection system was used. Epon
double-sector centerpieces were filled with 400 µl of sample solution
and PBS, respectively, and centrifuged at a rotor speed of 40,000 or
55,000 rpm and at rotor temperatures of 5 or 20°C, respectively.
Absorbance data were acquired at a wavelength of 280 or 230 nm,
respectively, and in time intervals of 2 min, with the radial increment
set to 0.002 cm and taking two averages per scan; interference scans were taken in time intervals of 1 min. Buffer viscosity, protein partial specific volumes and frictional ratios were calculated using
the software Sednterp (Laue et al., 1992
).
Sedimentation equilibrium studies were conducted in a Beckman Optima XL-A equipped with absorbance optics. Double-sector or six-channel charcoal-filled epon centerpieces were filled with 140 µl of sample at loading concentration between 0.1 and 0.6 mg/ml, respectively. Sedimentation equilibrium was attained at a rotor temperature of 4°C at rotor speeds of 10,000 and 13,000 rpm, respectively, and absorbance profiles were acquired at wavelengths of 230 and 280 nm. Extinction coefficient ratios at different wavelengths were estimated spectrophotometrically.
| |
DATA ANALYSIS |
|---|
|
|
|---|
Lamm equation modeling
Sedimentation velocity data analysis was performed with the
program Sedfit (which can be obtained from
http://www.AnalyticalUltracentrifugation.com). For direct boundary
modeling with distributions of Lamm equation solutions (Schuck, 2000
),
the measured absorbance or interference profiles
a(r, t) were modeled as an integral
over the differential concentration distribution
c(s)
|
(1) |
denoting noise components, and
(s, D, r, t) denoting
the solution of the Lamm equation for a single species (Lamm, 1929
|
(2) |
the rotor angular velocity), which was solved by
finite element methods on a static or moving frame of reference as
described in (Claverie et al., 1975The integral Eq. 1 was solved numerically by discretization into a grid
of 100-200 sedimentation coefficients and calculating the best-fit
concentrations for each species in a linear least squares fit.
Systematic noise components of the data were estimated by using an
algebraic method (Schuck and Demeler, 1999
) (see below). Numerical
stability was achieved by the maximum entropy method (Amato and Hughes,
1991
),
|
(3) |
|

c
ln c ds. The maximum entropy constraint
is
adjusted such that the increase in the
2 of
the constrained fit, as compared to the unconstrained fit (
= 0), corresponds to a confidence level of one or two standard deviations
(p = 0.68 or 0.95, respectively) as calculated by
F-statistics (Provencher, 1982a
(d2c/ds2)2 ds
that minimizes the second derivative of the distribution. Because of
its linearity, this term can be implemented in the computationally
simpler matrix form. However, it cannot describe isolated peaks as well
as the maximum entropy constraint, and tends to produce smoother
distributions, which can make it more robust.
For critical inspection of the quality of fits to sedimentation
velocity data, we have developed a two-dimensional picture representation of the residuals to avoid the usual loss of information on systematic deviations in the common overlay presentations. The
bitmap representation of the residuals was calculated in the following
way: The residual values of all points in all scans R(r, t) were transformed to a gray
value n(r, t) between 0 and 255, with
n = 0 for R(r, t) <
0.05, n = 255 for
R(r, t) > 0.05, and with a
linear transformation for the residuals
0.05 < R(r, t) < 0.05. This
transformation results in neutral gray (n = 128) for a
perfect fit with R(r, t) = 0, brighter pixels for positive and dark pixels for negative residuals. In
the bitmap, the pixels were ordered in rows that correspond to the scan
number, and columns that correspond to the radius values. This
representation of the residuals results in a uniformly gray picture
without any structure for a good fit with randomly distributed
residuals. If a structure is visible, this corresponds to systematic
residuals. In this way, systematic residuals from vibrations of the
camera, which produce vertical patterns, can be distinguished from
those of an imperfect boundary model resulting in diagonal structures. Also, the presence of isolated bad scans can be diagnosed from horizontal lines, and they can be identified from a separate output file of Sedfit of the local rms error for each file.
Systematic noise analysis
Components of systematic time-invariant and radial-invariant noise
were calculated using the algebraic approach described previously
(Schuck and Demeler, 1999
). In brief, the time-invariant baseline
signal bi at each radius
ri was minimized by least-squares according to
|
(4) |
|
(5) |
|
|
(6) |
|
(7) |
Inspection of Eq. 7 shows that the only information that can be
extracted from the data is that of a time-difference (here with an
average scan as a reference). It should be noted that no explicit
estimate of the time-invariant noise is used in Eq. 7. Nevertheless,
for fundamental reasons, modeling the time-difference introduces new
degrees of freedom into the data analysis, which is a consequence of
the unknown radial-dependent baseline offsets. This can lead to slight
correlation with parameters of the boundary model, in particular with
those describing very slow sedimentation processes (Schuck and Demeler,
1999
; Kar et al., 2000
). Such correlation can be minimized by using a
large data set that includes large boundary displacement (Kar et al.,
2000
). The calculation of an explicit estimate of the baseline
parameters with Eq. 5 follows after the nonlinear regression in Eq. 7
and allows comparing of the sedimentation model with the data in the
original data space (direct boundary modeling). It follows from Eq. 5
that the best estimate of the time-invariant signal is an average over
all scans of the residuals profiles. Therefore, it is dependent on the
parameters of the sedimentation model, and realistic estimates of the
time-invariant baseline signal are obtained only if the sedimentation
model fits the data well. However, this step does not introduce any
additional correlation in the estimates of the sedimentation parameters
{p}.
Calculation of the sedimentation coefficient distributions g*(s) and G(s)
For obtaining the apparent sedimentation coefficient
distribution g*(s), the direct boundary model for
a distribution of nondiffusing particles ls-g*(s)
(Schuck and Rossmanith, 2000
) was used, as implemented in the software
Sedfit. In this method, ls-g*(s) is calculated using the same concepts and numerical framework as the
distribution c(s) described above, but by
replacing the Lamm equation solution
(s, D(s), r, t)
in Eq. 1 with the theoretical sedimentation profiles of nondiffusing
species, i.e., step-functions U(s, r, t),
|
(8) |
|
(9) |
Eq. 8 can be combined with Eqs. 7 and 5 for systematic noise analysis. As described above, as a consequence of modeling the time difference in Eq. 7, some additional correlation and increased error of the distribution at small s-values can occur. Both can be minimized best in the ls-g*(s) analysis by using data sets with large boundary displacement and scans where the boundary has cleared the meniscus. It should be noted that, if used for the analysis of small molecules where diffusion is not negligible, Eq. 9 is not a good approximation, and, dependent on the size of the data set, large residuals and relatively poor estimates of the time-invariant signal may be obtained.
The differential sedimentation coefficient distribution defined by Eqs.
8 and 9 is termed ls-g*(s) to indicate its basis
on the least-squares data modeling (ls), and the neglect of diffusion (g*). A similar differential sedimentation coefficient
distribution can be calculated using the time-derivative
dc/dt, as approximated by the time difference
c/
t (Philo, 2000
; Stafford, 1992a
). This is
termed g(s*) to indicate the use of a
transformation of the radial variable r into an apparent
sedimentation coefficient s* in the course of its
calculation. In this method, the pairwise time difference between scans
is used to eliminate systematic time-invariant noise. The final
apparent sedimentation coefficient distribution
g(s*) from dc/dt can be
transformed back into a boundary model (with rhs of Eq. 8 and 9) for
comparison of model and data, and explicit estimates of the
time-invariant noise can be calculated via Eq. 5 (data not shown).
Because both methods g(s*) and
ls-g*(s) are based on equivalent definitions of
the apparent sedimentation coefficient distribution, when applied to
the same data sets, this leads to equivalent results for both the
g*(s) distribution (Schuck and Rossmanith, 2000
)
and the time-invariant noise estimates (data not shown). However, the
absence of a differentiation step in ls-g*(s)
allows larger boundary displacements between the scans, and avoids
artificial broadening effects that can be introduced by the
approximation of dc/dt by
c/
t (Schuck and Rossmanith, 2000
; Philo,
2000
). For comparison, where possible, g(s*)
analysis of time-difference sedimentation data was performed with the
program dcdt+ (J. S. Philo, 3329 Heatherglow Ct., Thousand Oaks,
CA 91360).
The integral sedimentation coefficient distribution
G(s) was calculated as described by van Holde and
Weischet (1978)
. This method is based on the Faxén-type
approximate solution of the Lamm equation, which can be written as
|
(10) |
(van Holde and Weischet, 1978
2t,
so that
|
(11) |

1
applied to both sides of Eq. 11, we arrive at
|
(12) |
0.5 scale to infinite time allows
for determination of s, and deconvolution of diffusion
effects on the sedimentation boundary. The resulting s-values from the different boundary fractions form the
integral sedimentation coefficient distribution
G(s) (van Holde and Weischet, 1978We have used the implementation of the vHW approach outlined earlier
(Schuck, 2000
). In brief, after determination of the plateau signals
for each curve, the boundary was divided in N (usually 20 to
50) fractions of equal concentration increments dh. The
radial position of the boundary fraction is calculated as
Ri = mean {r, with
dh × (i
0.5) < c(r) < dh × (i + 0.5)}, i.e., as the average of the radial values of all data
points that have signal values as defined by the limits of the boundary
fraction. This method is designed for a high number of fractions, where the boundary increment dh for each fraction are comparable
in size to the noise of the data, and it extracts the boundary
positions in a least-square sense, not requiring smoothing of the data. In the algorithm implemented in Sedfit, it is ensured that all boundary
fractions in all scans have at least one data point, otherwise the
number of boundary fractions N is automatically reduced.
As an alternative strategy for the calculation of G(s), we have implemented the following extrapolation of ls-g*(s) to infinite time: The total set of scans used for analysis was subdivided in sequential sets of scans, each taken at a time interval centered at ti. (For example, sets of 10 scans were used for the analysis of interference optical data.) For each set, a differential sedimentation coefficient distribution ls-g*(s)i was calculated and divided into N equal area fractions Aj. Because the area under the ls-g*(s) curves corresponds to the loading concentration (Eq. 8), these fractional areas are equivalent to boundary fractions, and the average sedimentation coefficient sij(Aj) in a given area fraction Aj at time ti directly corresponds to the s-values s*app,i calculated for each boundary fractions in the vHW method. As a consequence, the same extrapolation procedure, Eq. 12, can be applied to generate a distribution G(s). (It should be noted that this method, like vHW, requires the existence of solution plateaus to define consistent area fractions, and that it requires some depletion at the meniscus to avoid correlation of ls-g*(s) with baseline and systematic noise parameters.) Each of the ls-g*(s)i curves can be calculated taking into account time-invariant and radial-invariant systematic noise (Eq. 7), but best results were obtained if only time-invariant noise was considered (vertical alignment of the scans, e.g., close to the meniscus, may be achieved separately). After the linear regression with Eq. 12, the best-fit values of sij(Aj) can be transformed back into equivalent boundary positions, and a step-function model of the boundary in the original data space can be generated (Eq. 9). This allows for calculating overall best-fit estimates for the systematic noise contributions via Eq. 5 (see above).
There is a subtle difference between the two procedures. The division
of the boundary into equal fractions of the plateau signal introduces
small errors in the linear approximation of sapp,i versus
t
0.5. However, by substituting Eq. 10 with an improved approximate solution of the Lamm equation, a more
complex relationship sapp,i versus
t
0.5 can be derived (Eq. 17 of van
Holde and Weischet, 1978
). In the division of the
ls-g*(s) area, the division is made in units of equivalent loading concentrations, and the fractions are propagated in
time according to Eq. 9, such that they generate boundary fractions that have experienced different radial dilutions. This can be expected
to slightly alter the precision of the linear approximation. However,
in studies with synthetic data, we found this to produce similar
accuracy of the linear extrapolation
sapp,i versus
t
0.5 (both have errors < 1%,
data not shown). An independent theoretical justification of the
ls-g*(s) approach can be derived from the observation that the apparent sedimentation coefficient distributions exhibit a sharpness increasing with time, and by interpreting Eq. 10 as
describing diffusional spread in a space of "apparent sedimentation
coefficients." In this view, the division of
ls-g*(s) in area fractions and linear
extrapolation on a t
0.5 scale can be
understood solely as a rational method for extrapolating ls-g*(s) to infinite time (with the
transformation to the integral G(s) distribution
providing increased numerical stability of the extrapolation). However,
because of the close relationship of the methods, we interpret the
extrapolation of area fractions of the ls-g*(s)
distributions as an extension of the vHW method, in a sense that
ls-g*(s) can be calculated easily in the presence of systematic time-invariant noise of interference optical data by
applying algebraic noise decomposition (Schuck and Demeler, 1999
).
Sedimentation equilibrium analysis
Sedimentation equilibrium data were analyzed by global modeling of
3-6 data sets obtained at different loading concentrations and rotor
speeds using the commercial mathematical modeling software Mlab
(Civilized Software, Silver Spring, MD). Least-squares fits of the
measured absorbance profiles a(r) were calculated
using models based on the exponential equilibrium distribution of
ideally sediment-ing oligomeric species (Svedberg and Pedersen, 1940
)
|
(13) |

, the absolute temperature T, the gas constant R, the molar extinction coefficient

at wavelength
, and the thickness of
the centerpiece d (1.2 cm). Dependent on the particular
model used, the molar concentrations of the i-mer
ci(r0) were coupled by mass action law. In all models, the absence of partial
specific volume changes upon oligomerization was assumed.
A transformation of the absorbance distribution (of a single scan) into
a continuous molar mass distribution c(M)
combined with maximum entropy regularization was achieved by replacing the kernel in Eq. 3 by sedimentation equilibrium exponentials (Eq. 13)
for a single species. To allow for a rational comparison of the
concentration of the different species, average loading concentrations
were used as concentration units, obtained by integration of each
species from meniscus to bottom. This method is similar to the Laplace
transform with regularization described by Wiff and Gehatia (1976)
.
Because of the high sensitivity of the shape of
c(M) on the location of the bottom of the
solution column, for this analysis, the meniscus and bottom were
predetermined using intensity scans.
Dynamic light scattering
Dynamic light scattering experiments were conducted using a
Protein Solutions DynaPro 99 instrument with a DynaPro-MSTC200 microsampler (Protein Solutions, Charlottesville, VA). Protein samples
were centrifuged for 5 min in a microcentrifuge to remove dust
particles, and a 20-µl sample was inserted in the cuvette with the
temperature control set to 20°C. The light-scattering signal was
collected at 90°, and autocorrelation coefficients were exported for
analysis with the software Sedfit, adapted for dynamic light-scattering
analysis by replacing the Lamm equation solutions with the following
models for the field autocorrelation function:
|
(14) |
is the decay time and q = (4
n/
)sin(
/2), with the solvent refractive index
n, the wavelength of the incident light
, and the
scattering angle
(Murphy, 1997| |
RESULTS |
|---|
|
|
|---|
Comparison of the resolution using synthetic data
To examine the potential of Lamm equation modeling for resolving
species in sedimentation velocity experiments when no clear sedimentation boundary is visible, we have simulated data from the
model system that was proposed earlier by Stafford (1992b)
(Fig.
1). It consists of four species with
sedimentation coefficients 6, 7, 8, and 9 S. For each, theoretical
sedimentation profiles were calculated with a starting concentration of
0.25 (arbitrary signal units), and to the sum of their signals,
normally distributed noise at a magnitude of 0.01 was added. From
inspection of the resulting broad sedimentation profiles, the presence
of a heterogeneous mixture is obvious, but no distinct boundaries can
be identified (Fig. 1 A).
|
The distributions obtained greatly differ for the different methods
(Fig. 1 B). As can be expected, the apparent sedimentation coefficient distributions (both ls-g*(s) and
g(s*)) only reveal the range of
s-values, without finer structure in the size distribution. The results obtained with the vHW method for the integral sedimentation coefficient distribution G(s) (van Holde and
Weischet, 1978
), and for G(s) by extrapolation of
ls-g*(s) to infinite time are only slightly
better as they represent the range of s-values more accurately. Although they can, in principle, unravel the effects of
diffusion, the close spacing of the sedimentation coefficients clearly
exceeds the resolution. However, the four species can be discriminated
with the c(s) method, if combined with the
optimization of the weight-average frictional ratio (see below).
The deconvolution of diffusion by G(s) and c(s) merits more detailed consideration. The extrapolation of the boundary fractions of both G(s) methods is shown in Fig. 2. In comparison, the extrapolated ls-g*(s) distribution has fewer time points for the extrapolation (due to the need for a whole set of curves for calculating a single ls-g*(s) distribution), but it can be subdivided into more boundary fractions (Fig. 2 B). However, as expected, the results are very similar in both methods. Interestingly, although the linear regression of the boundary fractions appears to be of high quality, it is clear that they do not contain the information required for resolving the species underlying the model system.
|
The application of the distribution of Lamm equation solutions c(s) is based on one piece of additional information that can link s and D, which can be obtained most conveniently by estimating a frictional ratio f/f0 of the species under study. Because, in most cases, the data will not contain enough information for defining f/f0 as a function of sedimentation coefficient (or even a distribution of f/f0 for all species with the same sedimentation coefficient), the c(s) method is restricted to use only a single value, corresponding to a weight-average frictional ratio of all species. An estimate may initially be based, for example, on the expected hydrodynamic behavior for the type of macromolecule under study (e.g., for globular protein, or random coil). In the application to the data shown in Fig. 1 A, we started with an initial guess of f/f0 of 1.2, which resulted in two peaks at 6.5 and 8.5 S, but with a poor fit with an rms deviation of 0.015, and clearly systematic residuals (data not shown). The model functions calculated from c(s) were too broad, particularly in the earlier scans, indicating too large diffusion coefficients predicted by a too small prior estimate of f/f0 = 1.2. Therefore, we increased f/f0 to a value of 2 (resulting in the distribution with three peaks shown as dotted lines in Fig. 1 B), and then floated this parameter to be optimized in a nonlinear regression. This resulted in a c(s) distribution that exhibits peaks at the correct position of the species underlying the simulated data. Furthermore, the parameter f/f0 converged to a value of 2.9, which is close to the weight-average frictional ratio of 2.7 for the simulated, highly elongated, species. This indicates that nonlinear regression of f/f0 can be a useful technique to obtain good estimates for this parameter. Nonlinear regression was also found to be a good technique for determining the exact meniscus position, as well as the sedimentation and diffusion coefficients of co-sedimenting small molecules (such as buffer components that are not matched in the reference and sample side of the ultracentrifugal cell; data not shown).
When inspecting the details of the resulting best-fit
c(s) distribution, it should be considered that
the maximum entropy regularization causes the peaks not to be very
sharp, as one could expect for an ideal measurement. By design of the
Bayesian procedure, it restricts the structure in the
c(s) distribution to the features that are
essential for describing the raw data within the limits of the noise.
According to this result, any sharper distributions would not lead to
significantly better fits (at a confidence limit of 0.68). Therefore,
the c(s) distribution in Fig. 1 B
depicts the information that can be extracted reliably from the data in Fig. 1 A, and, by design, it does not represent the best
fit, which would, in most cases, be too unstable. However, it should be
noted that even in the presence of noise, all four species can be
unraveled with the c(s) analysis even without their boundary separation. Interestingly, the four species cannot be resolved if only
the data subsets suitable for vHW analysis are taken into consideration. We have shown earlier that, for broad continuous distributions, the regularization can produce artificial oscillations (Schuck, 2000
). The analysis here also highlights another well-known property of maximum entropy regularization, an inherent tendency to
merge closely spaced peaks, in particular in nonoptimal fits (where the
predefined F-ratio results in higher fractional increase in
2, and consequently higher regularization).
This is shown in the dotted line in Fig. 1 B. To achieve
optimal resolution, therefore, it appears important to balance the
distribution parameters (the confidence level and prior estimate of
f/f0) with the
inspection of the quality of the fit. As indicated above, this can
include optimization of
f/f0 by least-squares regression.
Study on the sensitivity of the methods using experimental data from an immunoglobulin G sample
Next, we compared the performance of the methods using data from a
nearly homogeneous immunoglobulin G (IgG) sample. The experimental fringe-shift data are shown in Fig.
3 A. A fit with discrete
solutions of the Lamm equation reveals one main species with 6.60 S,
but with a (statistically highly significant) 3% contamination of a
dimeric species with 9.58 (± 0.2) S. Similarly, a fit of
dc/dt with the Fujita-MacCosham-Philo function
(Philo, 1997
, 2000
) converges at a value of 5% of a faster sedimenting
species (data not shown). This may serve as a test for the sensitivity
in the detection of trace amounts of species.
|
Figure 4 shows differential sedimentation coefficient distributions that are uncorrected for the effects of diffusion, which are g(s*) by dc/dt (solid line), and the ls-g*(s) distribution (circles). If applied to the same data subset, both distributions are very similar, and both clearly reveal the presence of the larger species. Figure 3, B and C, presents the integral sedimentation coefficient distribution G(s) by extrapolation of ls-g*(s) to infinite time (circles in Fig. 3, B and C, calculated systematic noise shown as the dotted line in Fig. 3 A) and by conventional analysis after subtraction of systematic noise components (taken, for comparison only, from the best fit with the discrete Lamm equation modeling) (open squares in Fig. 3 B). It can be seen that the s-value of the main species is at ~6.7 S and that the highest fraction indicates the presence of a faster component. However, quantitation is difficult because of the relatively large noise present in the highest boundary (or area) fractions. Finally, the differential sedimentation coefficient distribution c(s), with diffusion deconvoluted assuming an average frictional ratio of 1.58, is shown as a dotted line in Fig. 4.
|
As in the first example, the increase in resolution in the
c(s) method is achieved in part because of the
larger number of files that can be included in the analysis, but mainly
through estimates of the extent of diffusion, which will be examined in the following. Figure 5 shows the
dependence of the quality of fit obtained at different values for the
frictional ratio. It can be seen that the rms error has a clearly
defined minimum. With nonoptimal values, the boundary shape is not well
described, as illustrated by the diagonal pattern in the residual
bitmaps. It should be noted that this occurs, to a similar extent, both at too low and too high values of
f/f0, and that the
assumption of the average shape to be spherical
( f/f0 = 1) is equally
poor as the limit of nondiffusing particles. (This limit of no
diffusion is identical to the ls-g*(s) and
g(s*) distribution). In contrast, the residual
bitmap shows very little systematic patterns at the optimal value of
1.58. As a consequence, like in the first example, we can extract
an average frictional ratio from the data itself by virtue of the
criterion of the quality of fit. How the different values of
f/f0 affect the calculated
c(s) distribution is shown in Fig.
6. Consistent with previous observations
(Schuck, 2000
), the position of the main peak remains essentially
constant, whereas smaller values of
f/f0 lead to sharper peaks.
However, the location of the peak of the trace component was found to
be correlated with f/f0. If
the diffusion is over corrected, information on the contaminating
faster-sedimenting species is lost, and the smaller peak appears
reduced in area and at higher s-values (inset in
Fig. 6). Alternatively, this is accompanied by a sharp decrease in the
quality of the fit (Fig. 5), which helps in determining the
s-value of the faster species. In the inset of Fig. 6, the two solid lines indicate two distributions that are indistinguishable on a confidence level of 0.9, suggesting an error estimate in the order
of 0.3 S (best fit at 9.38 S).
|
|
Application to the study of preparations of the herpes simplex capsid protein VP5
A more complex problem is the analysis of a protein with extended,
slow oligomerization. This is illustrated by experiments with
preparations of the herpes simplex capsid protein VP5 (the biological
implications will be discussed elsewhere). Previous reports of sucrose
gradient centrifugation suggested that the 149 kDa protein is monomeric
(Newcomb et al., 1999
). However, Fig. 7
A shows typical sedimentation profiles with a sloping
plateau region, indicating the existence of large aggregates, together with two separate major boundaries from discrete smaller species. Experiments at different loading concentrations and rotor speeds led to
similar distributions, but with slightly different peak areas,
consistent with a slow and at least partially reversible oligomerization.
|
With the data of Fig. 7 A, both versions of
G(s) are not applicable because of the absence of
a solution plateau. No consistent boundary fractions (or area
fractions, respectively) can be defined. Therefore, only the
differential sedimentation coefficient distributions can be compared
(Fig. 7, C and D). All distributions clearly show two maxima corresponding to the two visible separating boundaries, with
the peaks in c(s) clearly being the best
resolved. To avoid broadening from large time-intervals, only a small
subset of absorbance scans can be used in the calculation of
g(s*) by dc/dt (solid line in Fig. 7 C), limiting the range of
g*(s) under conditions where the peaks are well
resolved. Because the ls-g*(s) method does not
require the approximation of dc/dt by
c/
t (Schuck and Rossmanith, 2000
), a much
larger number of scans can be incorporated in the analysis, which shows
the presence of a high number of larger aggregates with
s-values up to 25 S, consistent with the results from the
c(s) analysis. However, although the
c(s) distribution may suggest separate peaks for
the larger species, a Monte Carlo statistical analysis reveals that the
apparent peaks at s-values larger than ~15 S may be
induced by oscillations from the regularization procedure and not
significant within the given level of noise in the data. However, this
only refers to the exact position of the c(s)
peaks, but not the existence of material at large s-values, which is highly significant.
Because of the formation of distinct boundaries, at least for the two slower sedimenting species, the oligomerization is slow on the time scale of the sedimentation, and it seems possible to assign oligomeric states to the individual peaks. This, however, requires additional information that we have sought in sedimentation equilibrium and dynamic light-scattering experiments. Global modeling of sedimentation equilibrium data at multiple rotor speeds and concentrations show that the majority of the protein is monomeric, but with significant contributions of small oligomers (data not shown; an example of sedimentation equilibrium profile modeled with monomer, dimer, and tetramer is shown in the inset of Fig. 8 A). Although the self-association scheme could not be identified, the data were consistent with an isodesmic association with contaminations of incompetent monomer. These results from sedimentation equilibrium show that the main peak of the c(s) corresponds to the monomer, and suggest that the second peak is a dimer (or possibly a trimer). With a monomer sedimentation coefficient of 6.8 S, we can calculate a Stokes radius (RS) of 5.2 nm, and a frictional ratio of 1.5 (equivalent to a prolate ellipsoid with 2a = 25.4 and 2b = 4.5 nm). This value was applied in the c(s) analysis of Fig. 7 D for diffusional deconvolution. Nonlinear regression of the weight-average frictional f/f0 (with a starting value of 1.0) converged to a best-fit value of 1.25, but with a final rms error of the fit that was not statistically different from that obtained with f/f0 = 1.5. Further, the c(s) curves calculated using both values virtually superimpose (data not shown). This confirms that a nonlinear regression of f/f0 leads to values sufficient in precision for the deconvolution of diffusion in the sedimentation coefficient distributions, although the obtained average frictional ratio itself is not suitable for the transformation of c(s) into precise molar mass distributions c(M).
|
Interesting from the methodological point of view is a transformation
of the sedimentation equilibrium data into a "model-free" molar
mass distribution c(M), as suggested earlier by
Wiff and Gehatia (1976)
(Fig. 8 A). This
c(M) transform does not take advantage of our
knowledge of the molar mass of the different oligomers, and it is
mathematically equivalent to the continuous size-distribution analysis
of the sedimentation velocity data. It shows a main peak at a molar
mass ~200 kDa, distinctly higher than the molar mass of the monomer
(149 kDa), clearly indicating the presence of oligomeric species.
Unfortunately, the data at molar mass >600 kDa are not reliable in
this transformation because they are mainly governed by assumptions of
sedimentation in the region of optical artifacts close to the bottom of
the cell. In contrast to sedimentation velocity, the
c(M) transform does not have sufficient
information to resolve the different species. A similar situation is
encountered in the interpretation of the dynamic light-scattering data,
which are commonly transformed to distributions of Stokes radii,
RS (Fig. 8 B). The
scattering intensity has a peak at ~5 nm, but also extends to larger
species. To better compare the relative abundance of the different
species, the distribution was rescaled into relative weight
concentrations as shown by the dotted line in Fig. 8 B.
From these data, it appears that particles with
RS > 8 nm are in very low
abundance, despite their significant contribution to the
scattered intensity. Like in the c(M) transform
of the sedimentation equilibrium data, no resolution of the oligomers is achieved. Nevertheless, the virtual absence of species with RS > 8 nm indicates that the 10.3-S
peak seen in sedimentation velocity may be a dimer
(RS = 6.9 nm), and less likely a
trimer (RS = 10.3 nm) or even larger
oligomers. Similarly, the 13.2-S peak may be a trimer
(RS = 8.3 nm) but less likely a
tetramer (RS = 10.6 nm) or even a
pentamer or a hexamer.
This example illustrates the current potential and limitations of the sedimentation coefficient distributions from complex oligomeric mixtures. It demonstrates that complementary information from sedimentation equilibrium and dynamic light scattering can be used (and is required) for the detailed interpretation. This is despite the much lower resolution of these methods due to the significantly more ill-conditioned analysis of exponentials as compared to the Lamm equation solutions. Correspondingly, the additional information from the c(s) distribution on the number and approximate size of species can be very important for the correct interpretation of the sedimentation equilibrium data.
Analysis of continuous size distributions of emulsions
As a last example, we analyzed a truly continuous size
distribution of lipid emulsion particles. General physical
characteristics of such particles and their use for the study of
apolipoproteins have been described by MacPhee et al. (1977)
and
(M. A. Perugini, P. Schuck, G. J. Howlett,
submitted). In the current context, for illustrating the
behavior of the size distribution, we considered the data from a
mixture of two different elution fractions after sucrose-gradient
ultracentrifugation. Figure
9 A shows the experimental flotation data exhibiting a bimodal boundary. For both of the fractions, we have measured the average diffusion coefficient by
dynamic light scattering (with hydrodynamic radii of 34 and 62 nm,
respectively). The dashed lines in Fig. 9 A are the
calculated best-fit distributions based on two discrete species with
the predetermined diffusion coefficients. The comparison with the boundary spread of the experimental data shows that the distributions of the fractions are broad, and that boundary broadening by diffusion is relatively small, but cannot be neglected.
|
Here, the analysis with the c(s) method can be
based on the knowledge that the emulsion particles are spherical, i.e.,
that f/f0 = 1.0. (For simplicity, we have
used the mean partial specific volume of 1.055 ml/g of the components
of the emulsion mixture; a refinement taking into account the full
size-dependence of the partial-specific volume of the particles is
included in [M. A. Perugini, P. Schuck, G. J. Howlett,
submitted]). The resulting size distributions are shown in
Fig. 9 B. When using maximum entropy regularization, we
obtained very noisy c(s) distributions with several artificial spikes (dotted line). This is consistent
with previous findings that use of the maximum entropy method for broad continuous distributions can cause artificial oscillations (Provencher, 1992
). However, this difficulty can be circumvented effectively by the
use of Tikhonov-Phillips regularization (solid line).
Through the second derivative minimization of this procedure, one can make use of the broadness and smoothness of the distributions as prior knowledge.
The ls-g*(s) method leads to a similar
distribution, which is only slightly broader because of the limited
extent of diffusion (Fig. 9 B, dashed line).
Further artificial broadening would be expected from the approximation
of dc/dt by
c/
t in
the g(s*) analysis, due to the large boundary
displacement between the absorbance scans (Schuck and Rossmanith, 2000
;
Philo, 2000
). (This results in a convolution of the
g*(s) distribution with a hyperbola segment of
width
s = s
t/t
[Schuck and Rossmanith, 2000
]; when restricting the analysis to scans
8-13, the broadening for a single nondiffusing species would be ~100
S for the first peak, and ~200 S for the second peak.) The vHW
analysis applied to a suitable data subset results in similar
information as ls-g*(s) or
c(s) (Fig. 9 B, circles).
However, less information on the faster floating particles is obtained,
and the two peaks from the two lipid emulsion fractions appear not as
well resolved as in the c(s) analysis.
| |
DISCUSSION |
|---|
|
|
|---|