| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Biophys J, March 2000, p. 1606-1619, Vol. 78, No. 3
Molecular Interactions Resource, Bioengineering and Physical Science Program, ORS, National Institutes of Health, Bethesda, Maryland 20892 USA
| |
ABSTRACT |
|---|
|
|
|---|
A new method for the size-distribution analysis of polymers by sedimentation velocity analytical ultracentrifugation is described. It exploits the ability of Lamm equation modeling to discriminate between the spreading of the sedimentation boundary arising from sample heterogeneity and from diffusion. Finite element solutions of the Lamm equation for a large number of discrete noninteracting species are combined with maximum entropy regularization to represent a continuous size-distribution. As in the program CONTIN, the parameter governing the regularization constraint is adjusted by variance analysis to a predefined confidence level. Estimates of the partial specific volume and the frictional ratio of the macromolecules are used to calculate the diffusion coefficients, resulting in relatively high-resolution sedimentation coefficient distributions c(s) or molar mass distributions c(M). It can be applied to interference optical data that exhibit systematic noise components, and it does not require solution or solvent plateaus to be established. More details on the size-distribution can be obtained than from van Holde-Weischet analysis. The sensitivity to the values of the regularization parameter and to the shape parameters is explored with the help of simulated sedimentation data of discrete and continuous model size distributions, and by applications to experimental data of continuous and discrete protein mixtures.
| |
INTRODUCTION |
|---|
|
|
|---|
The characterization of the size distribution of
polymers is one of the principal problems in the study of biological
macromolecules and of synthetic polymers. Numerous techniques based on
a variety of different principles have been developed for this task,
ranging, for example, from high-resolution mass spectrometry, dynamic
light scattering, analytical ultracentrifugation, size-exclusion
chromatography, field-flow fractionation, to gel electrophoresis.
Analytical ultracentrifugation is the oldest of these techniques and
has been surpassed by others with respect to precision and rapidity.
However, for several reasons, a considerable interest in the use of
ultracentrifugation for the characterization of size distributions
still remains. First, it is attractive for its theoretical simplicity
and firm basis on first principles. Hydrodynamic theory (and
thermodynamics in sedimentation equilibrium) can be directly applied,
and, for the separation of subpopulations of different size, no
interaction with matrices, surfaces, or a bulk flow is required.
Second, it is experimentally powerful and very versatile: the
macromolecules are characterized in solution, and can be studied at a
large range of concentrations, provided for by fluorescence (Laue et
al., 1997
; Schmidt and Riesner, 1992
), interference (Laue, 1994
;
Schachman, 1959
; Yphantis et al., 1994
), absorbance (Giebeler, 1992
;
Hanlon et al., 1962
; Schachman et al., 1962
), and Schlieren optical
detection systems (Svedberg and Pedersen, 1940
). It can be applied to
an extremely large macromolecular size range by adjustment of the rotor
speed. Third, analytical ultracentrifugation experiments generally
provide a large quantity of data with relatively high precision, and a
significant amount of experience in this technique has been accumulated
during the last seven decades.
Both sedimentation equilibrium and sedimentation velocity methods have
been used in the long history of the characterization of the particle
size distributions by analytical ultracentrifugation (Baldwin and
Williams, 1950
; Bridgman, 1942
; Fujita, 1962
; Lechner and
Mächtle, 1992
; Mächtle, 1999
; Scholte, 1968
; Signer and Gross, 1934
; Stafford, 1992
; Svedberg and Pedersen, 1940
; van Holde and
Weischet, 1978
; Vinograd and Bruner, 1966
). Sedimentation equilibrium
analysis (Lechner and Mächtle, 1992
; Scholte, 1968
) seems
intrinsically more problematic because of the difficulty involved in
unraveling the sedimentation equilibrium exponentials, and, in some
cases, the analysis has been constrained to parameterized model
distributions (Lechner and Mächtle, 1992
). Sedimentation velocity
experiments provide a richer database, because they observe the
strongly size-dependent time course of migration, although here the
size-distribution information is convoluted by the hydrodynamic properties of the particles.
Several different sedimentation velocity methods have been developed.
For very large particles where separation is achieved during the time
of the experiment, a well-conditioned high-resolution analysis can be
performed based on the spatial derivative of the sedimentation
profiles, dc/dr (Baldwin and Williams, 1950
;
Bridgman, 1942
; Fujita, 1962
; Signer and Gross, 1934
; Svedberg and
Pedersen, 1940
), or by the related method of observing the time course
of sedimentation at a single radial position (Mächtle, 1999
). For smaller particles, however, diffusion broadens the sedimentation boundary, which makes it more difficult to resolve subpopulations of
the distribution. In this regime, an established and very useful method
for analyzing size distributions is the apparent sedimentation coefficient distribution g*(s) (Frigon and Timasheff, 1975
;
Rivas et al., 1999
; Schachman, 1959
; Schuster and Toedt, 1996
;
Stafford, 1997
), using dc/dr, or the more
recently introduced time derivative dc/dt of the
sedimentation profiles (Stafford, 1992
). However, the apparent
sedimentation coefficient distribution obtained is convoluted by a
Gaussian due to diffusional broadening. An elegant and powerful method
to overcome diffusional broadening has been described by van Holde and
Weischet (Demeler et al., 1997
; van Holde and Weischet, 1978
). Here, by
extrapolation of the apparent sedimentation coefficients of
sedimentation boundary fractions to infinite time on a
t
0.5 scale, diffusion-free integral
sedimentation coefficient distributions G(s) are obtained.
All the established sedimentation velocity methods for
size-distribution analysis are similar in that they use different
transformations of the sedimentation data that have been analytically
shown to reveal, under the condition of long solution columns, the
sedimentation coefficient distribution. This approach has the virtue of
a model-free analysis. In general, if a model for the sedimentation
behavior of macromolecules is available, however, it is widely accepted that an analysis by directly fitting the model to the raw data can be
superior in information and precision of the derived parameters, although this is frequently computationally more difficult. For example, more information can be obtained from long-column
sedimentation equilibrium experiments of mixtures of ideal species by
multiexponential decomposition of the raw data in global analyses, as
now commonly in use, when compared to the more traditional
ln(c) versus r2
transformations of a single data set. The present study is concerned with the problem of formulating and exploring the properties of an
explicit boundary model for the size-distribution analysis in
sedimentation velocity experiments, based on numerical solutions to the
equations that govern sedimentation and diffusion, the Lamm equations
(Lamm, 1929
). This allows larger data sets in the analysis of a single
experiment and in global analyses of multiple experiments, and the
incorporation of prior knowledge on the distribution, which, as will be
demonstrated, can lead to a better resolution of size distributions.
Numerical solutions to the Lamm equations and their use for direct
fitting of ultracentrifuge data have been developed previously in
several laboratories (among them, Cann and Kegeles, 1974
; Claverie et
al., 1975
; Cox and Dale, 1981
; and Dishon et al., 1966
). More recently,
enabled by the increased computational resources, this became an
efficient and readily available tool for sedimentation-velocity data
analysis (Demeler and Saber, 1998
; Schuck, 1998
; Schuck et al., 1998
;
Stafford, 1998
). Lamm equation analysis can take into account all
boundary conditions of the finite length of the centrifugal cell and of
the effects of diffusion, but, at present, can only be applied to a few
discrete species. This paper describes an extension of the Lamm
equation analysis for the characterization of continuous size
distributions of macromolecules. The problem is stated as an integral
equation, and regularization is used for its numerical inversion. The
properties of the method in the application to discrete distributions,
and to broad, continuous size distributions are explored.
| |
THEORY |
|---|
|
|
|---|
In the absence of interactions between the macromolecules (or
particles), the experimentally observed sedimentation profiles of a
continuous size distribution can be described as a superposition of the
contributions of each subpopulation c(M) of
particles with sizes between M and M + dM. If
L(M, r, t) denotes the
sedimentation profile of a monodisperse species of size M at
radius r and time t, the problem is described by
a Fredholm integral equation of the first kind,
|
(1) |
. This
equation is encountered in similar form in problems of polymer
characterization in many other techniques. In the following, first the
calculation of the kernel L(M, r, t) will be
outlined, and then a detailed description of the method used for
inverting Eq. 1 by regularization will be given. This will closely
follow the method applied by Provencher (1982aSolution of the Lamm equation for a monodisperse subpopulation
In the case of sedimentation velocity ultracentrifugation of
dilute solutions of a polymer, the kernel
L(M, r, t) of Eq. 1 is
the solution of the Lamm equation (Lamm, 1929
),
|
(2) |
(r, t) in a sector-shaped cell
under the influence of the centrifugal field generated at a rotor
angular velocity
. s(M) and
D(M) are the sedimentation and diffusion
coefficient of the particle, respectively. They are both strongly
dependent on the molar mass, and are related by the Svedberg
equation,
|
(3) |
denotes the solvent density, R denotes the
gas constant, and T denotes the rotor temperature (Svedberg
and Pedersen, 1940
M of the solute may also be
dependent on the macromolecular size, but, in most cases, only weakly
or even negligibly (such a weak dependence will be indicated by the
subscript M).
It can be seen at this point that the sedimentation velocity analysis
of particles with continuous size distributions is complicated by the
fact that it requires knowledge of at least two functional dependencies
on size: in addition to c(M), it requires either the sedimentation coefficient s(M), or,
equivalently, the diffusion coefficient D(M).
Because the problem of Eq. 1 is ill-posed, even if it is known how the
sedimentation coefficient changes with size, it seems impossible to
calculate both distributions c(M) and
s(M) from noisy experimental data. As will be
described in the following, this problem is addressed by assuming prior
knowledge of the partial specific volume
M and the frictional ratio
(f/f0)M (i.e., prior
knowledge of the hydrodynamic shape) of the macromolecules, which will
allow calculation of s(M) and D(M).
(Only in favorable cases of very narrow monomodal distributions or
negligible diffusion does it seem feasible to treat either
M or
(f/f0)M as a fitting parameter to be
determined through the data analysis.)
Although
M and
(f/f0)M, in general,
will also depend on the macromolecular size in many cases, either
reasonable estimates or measurements can be made. In some cases, it may
be a reasonable approximation that
M
and/or (f/f0)M does
not change with size; this may hold approximately true, for example,
for particles such as random coils of polymers, lipid vesicles,
emulsions, or, in a first approximation, even for mixtures of globular
proteins. Alternatively, a parameterized model for
(f/f0)M could be
used, such as the model of rodlike particles at a length-to-radius
ratio that increases linearly with M. Similarly, if the
particles can be approximated by multisubunit assemblies with regular
geometry, values of
(f/f0)M could be
derived with the help of hydrodynamic bead modeling (Bloomfield et al.,
1967
; de la Torre, 1992
). In some cases, D may be constant,
allowing the direct use of Eq. 3 to derive s as a function
of the buoyant molar mass (an example of this, ferritin, is shown
below). Finally, the values for
M and (f/f0)M may be
measured in additional experiments for several fractionated
subpopulations of the particles, which then can be combined with
polynomial interpolation of the obtained values to approximate
(f/f0)M at any size.
How possible errors in
M and
(f/f0)M affect the
calculated distributions c(M) and
c(s) will be examined below.
Given
M, one can calculate the
radius R of an equivalent sphere with the same volume as the
particle by simple geometrical relationships (Laue et al., 1992
).
This leads to the minimum hydrodynamic frictional coefficient of an
equivalent sphere. With the shape information of the particle expressed
through the frictional ratio (f/f0)M, the
diffusion coefficient of the particle then follows from the
Stokes-Einstein relationship as
|
(4) |
0 and
r denote the
standard and relative viscosity of the solution, respectively. This
result can then be inserted into the Svedberg equation (Eq. 3) to
obtain s(M). Given s(M) and
D(M) and their inverses
M(s) and M(D), the size
distribution c(M) can then easily be transformed into a sedimentation coefficient distribution
c(s) := c(M(s)) and a diffusion coefficient distribution c(D) :=
c(M(D)). These are basically
equivalent descriptions of the distribution, although they represent
different aspects of the particle size distribution.
After calculating s and D for a particle of size
M, the numerical integration of the Lamm equation was
started with the initial condition of a uniform concentration
(r, 0) = 1, and with graphically predetermined
positions of the meniscus and bottom of the solution column (these can
also be treated as floating parameters to be optimized in the nonlinear
regression). Lamm equation solutions were calculated on a grid of
between 200 and 500 radial points. For low values of
2s, the finite element method
developed by Claverie et al. (1975)
was used, combined with a
Crank-Nicholson scheme (Crank and Nicholson, 1947
) and an algorithm
for adaptive step sizes in time (Schuck et al., 1998
). For higher
values of
2s, the moving grid
finite element method (Schuck, 1998
) was used. The later method is
particularly well suited for the simulation of sedimentation of large
particles with low diffusion coefficient, because it remains both
numerically stable and relative efficient for very small values
of D.
Analysis of the size distribution c(M)
For very large particles, the influence of diffusion flux on the
particle distribution during the time of the sedimentation experiment
is negligible compared to the sedimentation flux. As a consequence,
L(M, r, t) can be
approximated by a step function L(M, r, t) =
exp(
2
2sMt) × H(r
r*(M, t)) at a position
r*(M, t) = rmexp(
2sMt)
(with the meniscus position rm)
(Fujita, 1962
). In this limiting case,
r*(M, t) can be used to change the
integration variable in Eq. 1, and differentiation with respect to the
radius r directly solves the integral. Therefore, the
derivative of the measured concentration profiles at any time can be
directly related to the particle size distribution (Baldwin and
Williams, 1950
; Bridgman, 1942
; Fujita, 1962
; Signer and Gross, 1934
;
Svedberg and Pedersen, 1940
). Unfortunately, this approximation
holds well only for larger macromolecules and is not suitable for many biopolymers.
The consideration of diffusion increases the complexity of Eq. 1, and
the smoothness of the sedimentation boundaries of single species
L(M, r, t) makes Eq. 1 an
ill-posed problem. As is characteristic for such problems, a large set
of different c(M) distributions may fit the data
equally well, and a straightforward discretization and inversion
usually leads to large, artificial high-frequency oscillations in
c(M).1 It was
observed that, for the present problem (in particular, in the case of
narrow distributions), the condition of non-negativity imposed on
c(M) suppresses most of these oscillations. For
further stabilization, regularization was used. Following the maximum entropy method, a term can be added to the inverse problem of Eq. 1,
|
(5a) |
|
, this penalty term increases the
rms error of the fit as compared to the optimal fit in the absence of
regularization (
= 0), and the increase of the ratio of the
variance,
2(
)/
2(
= 0), can be correlated with a probability P via F-statistics (Johnson and Straume, 1994
such that the
quality of the fit still remains statistically indistinguishable from
the unconstrained fit, based on a given confidence level P
and on the level of the noise of the data (Bevington and Robinson, 1992Alternatively, Tikhonov-Phillips regularization with the term
(Phillips, 1962
)
|
(5b) |
|
is adjusted by the variance
2(
)/
2(
= 0), it also selects from the set of all distributions
c(M) that lead to a statistically
indistinguishable fit to the raw data the one distribution that
exhibits the highest parsimony. As has been pointed out by Provencher
(1982a)For the numerical calculations, first the continuous molar mass
distribution c(M) of Eq. 1 is approximated by
considering the concentrations ck on a
grid of N molar mass values
Mk,
|
(6) |
j, as described in detail in Schuck
and Demeler (1999)
|
(7) |
|
|
denotes the algebraic transformations required for
the calculation of the systematic noise parameters (Schuck, 1999
|
(8a) |
|
(8b) |
As described above, the regularization parameter
was adjusted to
reach the predetermined variance ratio calculated by F-statistics. Because of the large number of data points involved in the analysis, the influence of the constraints on the degrees of freedom is neglected. Unless noted otherwise, for any given data set, the variance
ratio was calculated corresponding to a probability p = 0.95. With the usual number of experimental data points in the order of
104-105, the variance
increase due to regularization is typically in the order of 1%.
Finally, the distribution ck was
rescaled by trapezoidal integration such that the integral over
c(M) equals the total loading concentration.
The computational cost of the method is an important factor for
practical use. It is determined mainly by two procedures: the
N solutions of the Lamm equation and the N × N summations over all data points of the pairwise
products of L involved in the calculation of the elements
Akl of the normal equations. The latter increases
quadratically in N, and therefore determines the computation
time for large N and large data sets. (The importance of
this can be seen by considering a typical set of interference data:
with 100 scans, 1000 data points per scan, and N = 100, 109 summation and multiplication operations are
required to build the entire matrix
Akl.) For a relatively low number of
data points or a lower resolution in c(M), the
solutions of the Lamm equations determine the computation time. During
Monte Carlo simulations, these two steps need only to be calculated
once. The inversion of Eq. 8 and the adjustment of
can be
accomplished comparatively rapidly. In the current implementation of
the program SEDFIT, when using moderate amounts
of data (e.g., 1-2 × 104 data points,
N = 100), the distribution can be calculated with a
fast PC, typically, in significantly less than one minute, and one
Monte Carlo iteration in a few seconds.
The sedimentation coefficient distribution analysis in the absence of
diffusion was performed by replacing the Lamm equation distributions in
Eq. 1 by the well-known step functions
U(r, t) =
exp(
2
2sMt) × H(r
r*(M, t)) at a position
r*(M, t) = rmexp(
2sMt)
(Fujita, 1962
; Stafford, 1992
). This is closely related to the
conventional g*(s) approximation of the
sedimentation coefficient distribution in the absence of diffusion
(Stafford, 1992
), and, if applied to a data set from a small time
interval, the numerical results are equivalent to those derived from
dc/dt analysis (P. Schuck and P. Rossmanith, submitted).
| |
EXPERIMENTAL |
|---|
|
|
|---|
Sedimentation velocity experiments were performed with a Beckman
Optima XL-A analytical ultracentrifuge equipped with absorbance optics.
Horse spleen apoferritin (Sigma A3641) and horse spleen ferritin
(Boehringer 197742) were diluted into PBS, and epon double-sector centerpieces were filled with 300 µl of the protein sample and PBS,
respectively. Using an An50-Ti rotor, the samples were centrifuged at a
rotor speed of 15,000 rpm at a temperature of 24°C. Scans were
acquired at a wavelength of 230 nm in time intervals of 210 sec. The
partial specific volume of 0.73 ml/g for apoferritin monomers was
calculated based on the amino acid composition using the program
SEDNTERP (Laue et al., 1992
). g*(s)
analysis was performed with the program
DCDT+ (J. S. Philo, 3329 Heatherglow Ct.,
Thousand Oaks, CA 91360). Dynamic light scattering experiments were
conducted with a DynaPro-MSTC200 (Protein Solutions, Charlottesville,
VA), with the temperature control adjusted to 24°C.
Van-Holde-Weischet analyses were performed according to methods
described in detail in Demeler et al. (1997)
and van Holde and Weischet
(1978)
. Briefly, the sedimentation boundaries were divided in
Nf fractions of the plateau signal
c0, and the best least-square radial
positions of the boundary fractions were calculated by averaging the
radii of all data points af with
absorbance values within the limits of each fraction (i.e.,
(f
1)*c0/Nf < af < f*c0/Nf
for fraction f). The first and last fraction was omitted in
the further analysis because of their larger noise in their calculated
radial positions. Nf was chosen such
that all fractions had at least one data point in each scan. Apparent
s-values were calculated, and s-values at
infinite time were determined by least-squares extrapolation in a
t
0.5 scale, as described in van
Holde and Weischet (1978)
, defining an integral sedimentation
coefficient distribution G(s).
All computational methods were implemented into the Windows-based ultracentrifugal analysis program SEDFIT, which is available on request, or can be downloaded from http://www.biochem.uthscsa.edu/auc/software, and from the RASMB network at ftp://rasmb.bbri.org/rasmb/spin/ms_dos/.
| |
RESULTS |
|---|
|
|
|---|
The resolution of the method will be examined first for the case of relatively small molecules, where the influence of diffusion is comparatively large and no visual separation during the sedimentation process is achieved. Figure 1 A shows simulated sedimentation profiles of a discrete mixture of two spherical molecules of molar masses 30,000 and 50,000, and sedimentation coefficients of 3.4 and 4.78 S, respectively, at loading concentrations of 0.5 for each species, superimposed by a normally distributed error of 0.01. Also shown in Fig. 1 A are the best-fit single-component sedimentation profiles (dashed lines), which resulted in an apparent molar mass of 25,700 and a sedimentation coefficient of 4.1 S. The unphysical combination of such a low value for the apparent molar mass (or high value for the apparent diffusion coefficient, respectively) and this relatively high value for the sedimentation coefficient is a result of the broadening of the sedimentation boundary due to the underlying heterogeneity. It should be noted that the fit is not of acceptable quality (rms error = 0.0155), because a single-component model cannot describe the initially sharp but rapidly broadening sedimentation boundary well. This distinct difference between the diffusion broadening of the sedimentation boundary of a single sedimenting component, and the boundary shape of a heterogeneous mixture, provides the potential for gaining information on the size distribution. This difference will be larger when a larger relative separation of the size of the species is present, and in cases of larger particles with smaller diffusion coefficients (see Fig. 4 B below).
|
The calculation of the molar mass distribution is performed using Eq.
8, on a grid of N = 100 molar-mass values between
20,000 and 70,000. For the calculation of both sedimentation and
diffusion coefficients s(M) and
D(M) according to Eqs. 3 and 4, first spherical particles (f/f0 = 1) with a
partial-specific volume
= 0.73 cm3/g were assumed (the values identical to those
used for generating the data). In the absence of regularization
(
= 0) this results in sharp peaks at the correct molar masses
underlying the simulation (Fig. 1 B). However, the location
of these peaks depends strongly on the details of the simulated data
and of the model. This is illustrated by the effect of using slightly
incorrect frictional ratios, which leads to shifts of the location of
these sharp peaks, or to fragmentation into two groups of 2-3 peaks,
without significantly changing the rms error of the fit (<0.0101)
(Fig. 1 B). This clearly demonstrates that the direct
solution of Eq. 6 without regularization results in an unreliable level
of detail. When the parameter
for the maximum entropy
regularization is adjusted to a probability of p = 0.68, significantly smoother curves are obtained, which are much more
robust against small errors in the model. The two components can still
be clearly resolved (Fig. 1 B). Because the rms error of
the fit is not significantly worse (<0.0104) than the fits without
regularization, these curves reflect much better the information that
can be extracted from the distribution analysis of the sedimentation
data. Similar results were found when studying discrete distributions
in a larger size range, or when using the Tikhonov-Phillips
regularization (data not shown). Under comparable conditions with
simulated noisy data, two discrete species with a 30% relative
difference of the molar mass in the range of 100,000 and a 20%
relative difference in the range of 1,000,000 could be resolved (data
not shown).
If the assumptions on the shape of the particles implied by
f/f0 = 1 and the assumed value of
do not lead to a reasonable approximation of
s(M) and D(M), however, the
regularized distributions c(M) are significantly
broader, limiting the resolution of the two species (Fig.
1 B, offset data). It should be noted that this is accompanied by an increase of the rms error of the fit (9% increase
for the data shown with f/f0 = 1.2 in Fig.
1 B). In cases where at least the assumption of shape
similarity among the species is correct, this increase of the rms error
can be used as the basis for nonlinear regression and fitting for the
parameter f/f0. (In the implementation used
in SEDFIT with a simplex routine, N = 50 and p = 0.68, this converged
rapidly to a best-fit value for f/f0
of 1.005.)
An alternative representation of the distribution is the transformation
into a sedimentation coefficient distribution
c(s) using the s(M)
relationship from the Svedberg equation (Eq. 3) (Fig. 1 C).
The distributions c(s) are much more robust than
c(M) against poor assumptions for the shape of
the molecules: whereas errors in the frictional ratios lead to overall
translations of c(M), these errors only affect
the resolution in c(s), but not the location of
the peaks. If Eq. 1 is used in the limit of no diffusion, a broad
apparent sedimentation coefficient distribution is obtained. With
estimates of the hydrodynamic parameters that lead to a good
approximation of the diffusion coefficient, c(s) results in two distinct peaks. Errors in the frictional ratios that
produce too low diffusion coefficients in Eq. 4 led to broadening of
c(s), whereas errors that produce too large
diffusion coefficients (such as the case of the too small value of
= 0.70 cm3/g shown in
Fig. 1 C) led to artificially sharp distributions c(s). A comparison with the established methods
shows that the results at D = 0 are very similar to
those from the time-derivative g*(s) analysis (Stafford,
1992
), which produces apparent sedimentation coefficient distributions
in the approximation of no diffusion, whereas even moderately precise
estimates of the frictional ratios (or diffusion coefficient,
respectively) lead to peak sedimentation coefficients consistent with
those obtained by the van Holde-Weischet method, which corrects for
diffusion broadening of the boundary (Fig. 1 D) (van Holde
and Weischet, 1978
).
The influence of the regularization parameter on the calculated
c(M) in the case of discrete distributions
(
-functions) is that of a broadening of the peaks in
c(M) (in case of second-derivative regularization
approximately Gaussian shaped), with a half-width that increases with
(Figs. 1 B and 2 C). For the discrete
distribution of Fig. 1, increasing the regularization from
p = 0.68 to 0.95 still allows clear distinction of the
two peaks (with a ratio of c(s) height at the
peak to the enclosed minimum of ~4:1, data not shown).
To study the effect of regularization for broader, continuous
distributions, noisy sedimentation data based on model distributions in
different size ranges were simulated. Figure
2, A and B, shows the analysis of a step-function model for a homogeneous size
distribution in the molar mass range between 30,000 and 70,000 (Fig.
2 A) and at 1000-fold higher molar masses (Fig.
2 B). Without regularization (
= 0), as can be
expected, a series of sharp peaks were obtained, which, in their
location and height, strongly depend on the noise of the data. Already
with a very small degree of regularization (a variance increase of


/
0 < 0.1%,
adjusted to p = 0.55), the analysis resulted in
continuous distributions. However, they still can exhibit oscillations
that mimic a structured, apparently multimodal distribution (this was
observed in particular with the second derivative regularization, data
not shown). When the regularization parameter was increased to a level
corresponding to p = 0.68 (

/
0 ~ 1%) or
p = 0.95, which selects the most parsimonious of all
c(M) distributions that lead to statistically comparable fits of the sedimentation data, in most cases, a relatively unstructured distribution was obtained in which misleading peaks were
absent. Further increase of the regularization parameter to a
significantly larger value of


/
0 = 10%
(p > 0.99) only slightly worsened the resemblance of
the calculated and the underlying model distributions (Fig. 2,
dash-dot lines). As illustrated in Fig. 2, A and
B, the resolution increased slowly with increasing size of
the particles. Also, the results were found to improve when studying
model distributions with higher degree of smoothness. This is
illustrated in Fig. 2 C, where the calculated
c(M) distributions are shown for simulated noisy
sedimentation data that are based on a size-distribution model
combining a Gaussian and a
-function.
|
Again, if the distributions are transformed into a c(s) distribution, they can be easily compared with the integral sedimentation coefficient distributions G(s) from the van Holde-Weischet analysis (insets of Fig. 2). The results of both methods were found to be very consistent. However, the distributions c(s) appear to have higher information content in the description of the shape of the distributions: whereas the G(s) curves from the van Holde-Weischet analysis of Figs. 1 D and 2 A are qualitatively similar, the corresponding c(s) profiles resolve the difference between a broad continuous and a discrete bimodal distribution better.
As a first application of the method to discrete mixtures of globular
proteins, the interference profiles from sedimentation experiments with
myoglobin and gamma globulin were analyzed (Fig. 3, A-C). These data have been
published before (Schuck and Demeler, 1999
) in the context of
demonstrating the validity of the algebraic systematic noise-reduction
procedure developed for the analysis of interference optical data. In
the previous analysis, known partial specific volumes and molar masses
of the proteins had been used as prior knowledge. In the present
context, to evaluate the robustness of the size-distribution analysis
method, the data were reanalyzed without this prior knowledge, but
instead making the assumption of having globular proteins with
approximately spherical shapes (f/f0 = 1),
and with an estimate of the partial specific volume of 0.73 cm3/g. As is shown in Fig. 3 D, the
calculated distributions c(M) and
c(s) exhibit well-defined, sharp peaks, as can be
expected for this discrete mixture of proteins. Because these proteins are not truly spherical, the molar mass values at the peak maxima of
c(M) (13,500-15,800 for myoglobin,
87,400-93,700 for gamma globulin monomer, 174,000-190,000 for the
dimer) do not coincide with the true molar masses of these species, but
instead represent their molar masses approximately reduced by the
frictional ratio (f/f0 ~ 1.5 for the IgG
species, based on the earlier results). This problem is absent in the
sedimentation coefficient distribution c(s). Both
distributions c(M) and c(s)
give an excellent fit to the data, and reflect the main features of the
samples, i.e., the presence of a small component, and the presence of
two larger components with a molar mass ratio of 2:1. When using the
Tikhonov-Phillips regularization (Eq. 5b), the resulting distributions
suggest the presence of a small amount of aggregates much larger than
the IgG dimer (data not shown), but this cannot be resolved well, and
is not observed with the maximum entropy regularization. In both
methods, an artifact is visible at very small molar masses in the
distributions where the sedimentation profiles are correlated with the
baseline parameters.
|
As an example for the application of the method to a continuous mass
distribution, the sedimentation velocity profiles of a ferritin sample
were studied. Ferritin is well-known to exhibit a broad distribution in
the iron content (see, e.g., Leapman and Hunt, 1995
). Apoferritin and
ferritin do not differ in their sizes, but only in their molar masses
and partial specific volumes, depending on the number of iron molecules
in the core. As a consequence, the diffusion coefficient should remain
constant, and the sedimentation coefficient distribution
s(M) according to Eq. 3 can be directly related
to the buoyant molar mass distribution c(M*).
Dynamic light-scattering experiments with the ferritin and the
apoferritin samples gave autocorrelation functions that were very well
described by that of a single species with nearly identical diffusion
coefficients of 3.37 × 10
7 and 3.11 × 10
7 cm2/sec, and
hydrodynamic radii of 6.4 and 6.7 nm, respectively. This is consistent
with the radius of ~6.5 nm measured for murine ferritin by electron
microscopy (Ohkuma et al., 1976
). In the analysis of the sedimentation
profiles of apoferritin (Fig.
4 A), when constraining the
diffusion coefficient to a value of 3.37 × 10
7 cm2/sec, a reasonable
fit was obtained with a sedimentation coefficient sw, 20 of 18.9 S, which corresponds to
a molar mass of ~540,000 (rms error = 0.0113 OD; a slightly
better fit of rms error = 0.0100 OD could be obtained by taking
into account free monomers of ferritin). In contrast, the sedimentation
velocity profiles of the iron-loaded ferritin could not be well
described by the single-species model with the predetermined diffusion
coefficient (rms error = 0.0321, sw = 67.1 S, Fig. 4 B),
because the broadening of the sedimentation boundary is much larger
than that of a species with D = 3.37 × 10
7 cm2/sec, indicated by
the dashed line in Fig. 4 B. This suggests strong
heterogeneity of the ferritin sample.
|
The calculated buoyant molar mass distributions c(M*) of apoferritin, ferritin, and a mixture are shown in Fig. 5 A. All result in very good fits of the data, with rms errors of ~ 0.009 OD. For the apoferritin, the majority of the material is in a single peak with a maximum at a buoyant molar mass of 140,000 (Fig. 5 A, dotted line). The presence of a small fraction of material of approximately double the size of the main peak is suggested. The c(M*) distribution of ferritin is characterized by a broad, asymmetric peak with a maximum at a buoyant molar mass of 540,000, but also exhibiting a broad distribution of smaller material, including molecules of the size of apoferritin (Fig. 5 A, dashed line). For the mixture, the clearly bimodal sedimentation profiles of Fig. 4 C translate in the c(M*) distribution into a bimodal mass distribution, with maxima at buoyant molar mass values of 140,000 and 530,000 (Fig. 5 A, solid line). The features of the ferritin distribution seem to be reasonably well reproduced.
|
It should be noted that size-distribution of the mixture exhibits a small oscillatory finer structure, which does not appear in the ferritin distribution (Fig. 5 A, dashed and solid line). To study whether these oscillations are essential features of the data, and how sensitive they are to the noise in the raw data, we performed Monte Carlo simulations. Simulated data sets were replicated (n = 103) based on the calculated best-fit sedimentation profiles as shown in Fig. 4 C, with normally distributed noise added in the magnitude of the rms error of the fit. The inset in Fig. 5 A shows the mean distribution c(M*) and the 5% and 95% contours, respectively. In this statistical average, c(M*) appears slightly smoother, which demonstrates that some of the oscillatory fine structure in c(M*) can be governed by noise in the data, and may not be features of the true underlying particle size distribution. Nevertheless, comparing the distributions obtained from the Monte Carlo analysis and the results from van Holde-Weischet analysis, although they are qualitatively consistent, it appears that a higher level of detail can be extracted from the Lamm equation model.
A basic assumption of the distribution analysis is that the observed
sedimentation data are a simple superposition of the sedimentation
profiles of noninteracting macromolecules (Eq. 1). However, because of
the practical importance of this case for the study of proteins, the
results obtained when applied to a system of interacting species was
investigated. The sedimentation process was simulated for a rapid
monomer-dimer and monomer-trimer self-association, using the Lamm
equation methods described in Cox, (1969)
and Schuck (1998)
, with 1%
normally distributed noise. The conditions of the sedimentation were
chosen to generate profiles generally similar to those in Fig.
1 A; as can be expected for these systems, no separation of
sedimentation boundaries was achieved (data not shown). The
sedimentation profiles of these self-associating systems could be
fitted very well by the continuous mass distribution (data not shown),
which, in the absence of regularization, resulted in a large number of
discrete peaks in c(M) (see, e.g., the dotted line in Fig. 6 C). But, in
contrast to discrete mass distributions of noninteracting species,
small regularization at a level p = 0.68 already led to
very broad, smooth distributions. This result was qualitatively
independent on the regularization procedure (data not shown). Compared
to the relatively sharp distributions obtained from a superposition of
noninteracting species under identical conditions (Fig. 6, dashed
lines), the spreading of the sedimentation boundary that is caused
by the rapid self-association results in an apparent population of
macromolecules with a broad range of intermediate sizes (Fig. 6,
bold lines), with the positions of the maxima dependent on
the loading concentration and the association constant. This is
analogous to the results from the van Holde-Weischet analysis, where
the case of interacting and noninteracting monomers and dimers can also
be clearly distinguished from the positive slope and the range of
sedimentation coefficients in G(s)
(Fig. 6 B).
|
| |
DISCUSSION |
|---|
|
|
|---|
The present paper describes a method for direct boundary modeling for the size-distribution analysis in sedimentation velocity analytical ultracentrifugation. Because the continuous size distributions are approximated by a superposition of Lamm equation solutions, the effects of diffusion can be taken into account, and a relative high resolution can be achieved for small molecules in the size range of proteins.
Although unraveling of diffusion effects in this way was found to be
similarly effective as the extrapolation to infinite time in the van
Holde-Weischet method (van Holde and Weischet, 1978
), the direct
boundary modeling can offer several advantages. First, because the Lamm
equation method can take into account the end effects of the solution
column, there is no requirement for a solvent and solution plateau to
be established, which allows the analysis of the data from an entire
sedimentation experiment. This ability to make maximal use of the
information of the boundary spreading observed over a large time period
enhances the ability for distinguishing boundary spreading due to size
heterogeneity from simple diffusional spreading. This, combined with
better statistical properties of a direct fit, seems to be the origin of the higher level of detail in the c(M) as
compared to the G(s) curves. The new method can
also be applied to experiments of mixtures that include small and
rapidly diffusing material, or samples with a very high degree of
heterogeneity. Second, as an explicit boundary model, the method can
use the algebraic noise decomposition techniques (Schuck and Demeler,
1999
), and be directly applied to interference optical data where a
significant systematic time-invariant background profile can be
superimposed to the macromolecular sedimentation profiles. Third, the
analysis also lends itself to be extended to a global fit of several
experiments, and allows to incorporate knowledge on the distribution
into the analysis.
The method presented here could be considered intermediate between a
more conventional direct boundary fitting method that uses an explicit
single- (or few-) component Lamm equation model (Demeler and Saber,
1998
; Philo, 1997
; Schuck, 1998
; Schuck et al., 1998
), and a relatively
model-free data transformation, such as the van Holde-Weischet method
to obtain G(s) (Demeler et al., 1997
; van Holde
and Weischet, 1978
), or the dc/dr (Baldwin and Williams, 1950
; Bridgman, 1942
; Fujita, 1962
; Signer and Gross, 1934
;
Svedberg and Pedersen, 1940
) and dc/dt (Stafford,
1992
) transformations used to obtain g*(s). The
size-distribution analysis proposed here is model-free in a sense that
it imposes virtually no constraints on the number and size of the
species present. However, in contrast to the data transformations
involved in the van Holde-Weischet method and in the
g*(s) methods, it requires prior knowledge on the
approximate density and shape of the molecules, and the density and
viscosity of the solvent. When available, this knowledge can be used to
enhance the resolution of the sedimentation coefficient distribution
and transform it into a size distribution c(M).
The relationship and the resolution of the different methods can be understood by considering different degrees of diffusion incorporated into the Lamm equation model (Fig. 1, C and D). In the absence of any diffusion, as can be expected, the distribution c(s) resembles an apparent sedimentation coefficient distribution g*(s). Even moderately precise estimates of the hydrodynamic shape and relatively low estimates of the diffusion coefficient leads to a substantial increase in resolution of c(s), which then defines a range of sedimentation coefficients of the sample consistent and comparable with G(s). It is important to note that the van Holde-Weischet method is very powerful in indicating the range of the true sedimentation coefficients of the sample, without further assumptions. If prior knowledge can be used, however, c(s) seems to have a higher resolution. This is indicated by a comparison of the G(s) from the bimodal discrete distribution of Fig. 1 D and the broader distribution of Fig. 2 A, where qualitatively very similar G(s) distributions were obtained, whereas the corresponding c(s) could distinguish the distributions better. Also, the comparison of the c(M*) and the G(s) distributions of the ferritin experiment (Fig. 5) indicates slightly higher information content of the Lamm equation analysis. However, because of the well-known tendency of the inversion of integral equations to produce oscillations, some of the details in c(M) can be deceptive. This is illustrated by the Monte Carlo simulations in Fig. 5, and represents a major technical difficulty with the presented approach.
The underlying problem is that Eq. 1 is an ill-posed problem for smooth
kernels (Phillips, 1962
). This has been extensively studied (Amato and
Hughes, 1991
; Hansen, 1992
; Phillips, 1962
), and is well known to occur
in many biophysical techniques, for example, in dynamic light
scattering (Provencher, 1979
). Because, in a direct inversion, the
size-distribution analysis in Eq. 1 tends to produce large oscillations
in c(M), the analysis requires regularization and
adjustment to the level of detail that one can reliably extract from
the experimental data. The approach used here closely followed the
technique of adjusting the regularization parameter by controlling the
variance increase of the fit that is introduced by the regularization
constraint, a technique developed by Provencher and implemented in the
program CONTIN (Provencher, 1982b
). As regularization methods,
maximum entropy regularization and Tikhonov-Phillips regularization
with a second derivative operator were studied. Maximum entropy
performed slightly better, because it could create sharper peaks for
discrete size distributions and had a somewhat lower tendency to
exhibit oscillations for broader distributions. Overall, the results
are consistent with the previous findings from numerical simulations
(Amato and Hughes, 1991
) and from studies of broadly distributed
biopolymers by light scattering (Provencher, 1992
). Alternative
numerical methods to avoid artificial peaks, such as described by
Provencher (1992)
could be adapted.
If one compares the physical processes observed for particle-size analysis in sedimentation velocity ultracentrifugation with those of dynamic light scattering, centrifugation has a strongly size-dependent directed migration in the centrifugal field in addition to the diffusion. Therefore, it appears that this additional source of information in centrifugal data should make the choice of the regularization procedure less critical. In both the experimental and the simulated data with continuous distributions, it was found that it is advantageous to slightly increase the regularization parameter to suppress artificial oscillations. This may be due to the inactivity of the non-negativity constraints in the case of broa