Department of Physiology and Biophysical Sciences, State University
of New York at Buffalo, Buffalo, New York 14214 USA
Hidden Markov modeling (HMM) can be applied to extract
single channel kinetics at signal-to-noise ratios that are too low for
conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The
optimization approach has the advantage that it optimizes the rate
constants directly. This allows setting constraints on the rate
constants, fitting multiple data sets across different experimental
conditions, and handling nonstationary channels where the starting
probability of the channel depends on the unknown kinetics. We present
here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an
autoregressive (AR) process. In addition to correlated background
noise, the algorithm allows for excess open channel noise that can be
white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the
algorithm to data containing multiple channels is described. Examples
are presented that demonstrate the applicability and effectiveness of
the algorithm. Practical issues such as the selection of appropriate
noise AR orders are also discussed through examples.
 |
INTRODUCTION |
Hidden Markov modeling (HMM) provides an
efficient approach for analysis of single channel currents. It is
particularly useful for records where the signal-to-noise ratio is low
or the channel kinetics is rapid. The conventional dwell-time approach
(Colquhoun and Sigworth, 1995
; Magleby and Weiss,
1990a
, 1990b
;
Horn and Lange, 1983
; Chay, 1988
;
Ball and Sansom, 1989
; Qin et al., 1996
, 1997
) fails in these cases
because an appropriate idealization of such data is often difficult.
The HMM approach analyzes the noisy data directly, thereby eliminating
the necessity of idealization. As a result, it has an improved
requirement on signal-to-noise ratio and has the potential to allow for
more rapid channel kinetics. In this approach, each sample point is
treated as a relevant interval and the probability of obtaining the
sequence of the observed data samples is calculated according to a
model. The model contains two terms: an amplitude term that is used to
estimate the probability that a given data point belongs to a given
current level, and a transition probability term that is used to
estimate the probability of a state transition between data points. The
amplitude term is usually described by Gaussian distributions centered
about each current level. The transition probability is described by a
matrix exponential composed of the model rate constants and describes
the probability of switching states between adjacent data points. The
product of the probabilities over all possible paths can be computed
recursively, producing the likelihood of observing the data given the
model. The likelihood is then optimized with respect to the parameters
of the model to produce the best fit. Because the noise is taken into
account explicitly, the HMM approach permits a relatively low
signal-to-noise ratio. It is also less prone to errors due to missed
events because the transition probability takes into account the
undetected transitions between adjacent samples. The penalty for using
the sample-based likelihood paradigm is computation time, as the
probabilities have to be evaluated at every data point.
Baum's reestimation, a precursor to the general
expectation-maximization (EM) approach for maximal likelihood
estimation, is the standard approach for estimating the parameters in a
hidden Markov model. When applied to single channel analysis, the
method, however, has several drawbacks. First, it assumes a perfect
Markov signal with white background noise. For experiments like
patch-clamp recording, this assumption is usually too restrictive. The
data are always low-pass filtered. Furthermore, the instrument noise has a power spectrum that increases quadratically with frequency, and
when the channel is open there is often additional noise arising from a
variety of sources, and usually has unknown spectral characteristics. For channels with slow kinetics, the effects of such distortions may be
negligible, but when the channel activity is busy, they become
significant and need to be addressed explicitly. Another problem is
that Baum's algorithm estimates discrete transition probabilities. The
channel kinetics, however, is a continuous process and the quantities
of interest are the rate constants. Although it is possible to convert
the transition probabilities into rate constants, the model topology
generally cannot be retained. Finally, in Baum's algorithm there is no
explicit control of the rate constants, thus it is difficult to impose
constraints such as detailed balance, and to fit multiple data sets
from different experimental conditions simultaneously.
The issue of correlated noise has recently been addressed by extending
Baum's reestimation formulae. Walsh and Sigworth (1992)
took the approach of preprocessing the data to pre-whiten the noise,
thereby reducing the problem to solving a high-order Markov process in
white noise; this, in turn, was reformulated into a first-order Markov
process with an enlarged state space. More recently,
Venkataramanan et al. (1998a
,
1998b
) extended this
approach to account for additional noise in the open states and to
estimate the noise correlations directly. To enable the direct
optimization of rate constants, Fredkin and Rice (1992b)
have attempted to use standard optimizers to maximize the likelihood
function. Their approach, however, is reported to be very
computationally intensive.
We have recently developed a direct optimization approach to hidden
Markov modeling of single-channel currents. The approach has two
essential features, i.e., the direct optimization of rate constants and
the use of analytical derivatives for optimization of the likelihood
function. These in turn provide the algorithm with several desirable
features, such as the capability for explicit control of model
topology, the allowance for imposition of constraints on rate
constants, and the flexibility of simultaneous fitting of multiple data
sets on different experimental conditions. The availability of the
analytical derivatives of the likelihood function makes it possible to
use the efficient gradient-based optimizers such as the variable metric
method to search the likelihood surface. Compared with Baum's
reestimation, the method has a favorable convergence near the maximum,
especially in the extreme cases of low signal-to-noise ratio and
aggregated kinetics. Under these conditions, Baum's algorithm often
exhibits a slow convergence in its approach to the maximum by taking
very small steps.
In this paper we extend this approach to address the issues of
band-limited signals and correlated noise. Following
Venkataramanan et al. (1998a
,
1998b
), we model the
noise by an autoregressive (AR) process, so that the data can be
reduced to a higher-order Markov process in white noise. However, we
exploit a different strategy to parametrize the noise. Instead of
estimating the AR coefficients, we choose to optimize the
autocorrelations of the noise. Such a choice was motivated by the fact
that the autocorrelations of two additive noises are linear
combinations of the individual ones. Thus the relationship for the
noise at different conductance levels, which is often additive, can be
taken into account explicitly through the use of linear constraints.
This feature is particularly useful for modeling multiple channel
activity where the noise at different conductance levels is highly
related. Other issues, such as the correction for low-pass filtering,
the optimization of initial probabilities, and the handling of multiple
channels, are also addressed. Finally, several prototype examples are
provided to illustrate the features and performance of the algorithm.
 |
THE MODEL |
We consider a channel with N conformation states
partitioned into M conductance classes. Let
Ii, i = 1 ... M denote the
current amplitude at each conductance. Transitions among the states are described by a time-homogeneous Markov process with an infinitesimal generator matrix Q =
[qij]N×N, where the
(i, j)th off-diagonal element qij
represents the transition rate from state i to state
j, and the diagonal elements are defined so that each row
sums to zero. The system can be either in equilibrium or transient following perturbation. Although not necessary, the model is usually irreducible with states reachable from each other.
In practice we observe a discrete sampling of the continuous process. A
sampled Markov process can be considered a Markov chain whose
transitions are described by a transition probability matrix, say
A = [aij], where
aij represents the probability of the channel
being in state j at the next sampling time given that it is
in state i at the current sampling time. For a given sampling interval
t, A is related to Q
by
|
(1)
|
The equation mathematically defines a one-to-one mapping between
Q and A. However, it is worth noting that the
inversion from a given A to Q may not always produce physically meaningful results because some Markov chains cannot
be considered to be sampled from a continuous Markov process (Qin et al., 2000
).
The noise in the data is modeled conductance-wise. That is, all states
with the same mean conductance are assumed to have the same noise
characteristics, but the noise for different conductances is allowed to
be different. For each conductance, the associated noise is modeled
using an autoregressive (AR) process to account for its color. Let
n(i)(t) denote the noise at the
ith conductance, and
i and
aj(i), 1
j
m be
the corresponding AR coefficients. For simplicity, we assume the same
order for all AR models at different conductances. An AR process can be
considered as the output of passing white noise through an all-pole
filter, i.e.,
|
(2)
|
where wt is white noise with unit variance.
From the functional approximation point of view, an AR model is
equivalent to using an all-pole rational function to approximate the
power spectrum of the noise. Because there are no zeros, the AR model
is more restrictive than the more general ARMA model (Kay and
Marple, 1981
), but given sufficiently high order, it can fit a
wide range of functions with either smooth or wavy features.
The effect of filtering is accounted using a finite impulse response
(FIR) filter whose coefficients are denoted by
hj, j = 0 ... n
1. We
assume that the filtering is applied only to the underlying signal. In
practice it affects noise too, but we leave that effect to be taken
care of by the noise model. We also assume in this paper that the
coefficients of the filter are known a priori. They can be determined
from the transfer function of the recording system and those of the
digital filters that may have been used. It should be emphasized that
although the model takes filtering into account explicitly, one should
always preprocess the data before analysis to correct for bandwidth.
This can be done through appropriate inverse filtering and decimation.
The filter introduced here is mainly intended for the leftover effect that cannot be corrected a priori. A filter with as few coefficients as
possible is important because a long filter, as seen later, will cause
the computation time to increase exponentially.
The advantage of using an AR model for noise is that it allows the
noise to be pre-whitened. Let st denote the
state sequence of the underlying signal and xt
be the corresponding conductance class. The observation can be written
algebraically as
|
(3)
|
where H(z) is the transfer function of the FIR filter
and A(i)(z) is the denominator of the
transfer function of the AR filter, i.e.,
The pre-whitening is done by multiplying
A(xt)(z) through Eq. 3, leading to
|
(4)
|
where p
m + n
1 and
cj(i), j = 0 ... p are
the coefficients of the product
H(z)A(xt)(z), or
equivalently, the convolution of {hj} and
{aj(i)}. The significance of such
pre-whitening is that it reduces the noise in the data to be white, as
implied by Eq. 4, thereby satisfying the HMM assumption. This is of
course achieved at the expense of further distorting the underlying
signal. When the AR model is known a priori, the pre-whitening can be
done experimentally. But for the problem here, it can only be done in
theory because not only is the noise model unknown, but the system is
time-variant, i.e., the noise model varies with the channel's conductance.
A potential problem with the AR model is the difficulty of imposing
constraints on the noise between different conductance levels. For
example, one might need to constrain the open noise so that it is the
superimposition of the closed noise with an extra white component. The
ability to allow for such constraints becomes particularly useful for
modeling multi-channel records, as will be shown later. The AR model,
however, is not linearly additive in the sense that the sum of two AR
processes is generally no longer an AR process. We circumvent this by
reparametrizing the noise using autocorrelations instead of AR
coefficients. The autocorrelation function has the desirable feature
that the autocorrelations of two independent random processes are
linear combinations of the autocorrelations of the individual ones.
The AR coefficients of an mth order AR process are fully
determined by its first m autocorrelations. This is stated
by the Yule-Walker equation (Kay and Marple, 1981
)
|
(5)
|
where rj(i), j = 0 ... m are the autocorrelations of the noise at the ith
conductance level. The equation can be solved using either a general
linear equation solver or, more efficiently, the Levinson-Durbin
algorithm (Blahut, 1985
). The latter has a complexity on
the order of m2 as opposed to
m3. For a process that is not strictly AR, the
above equation can still be used to determine the AR parameters. In
this case the resulting AR model fits the first m
autocorrelations exactly, and the remaining ones are extrapolated based
on the maximal entropy criterion.
With the above parametrization, the entire model, denoted by
,
constitutes of the rate constants (qij), the
current amplitudes (Ii), and the noise
autocorrelations (rj(i)). The problem is
then to estimate all these parameters from the given observations. In
the paradigm of HMM, the maximum likelihood approach is used. Let
Y = Y1 ··· YT denote the
observed samples. The likelihood function, denoted by
L(
), is defined as the probability of observing
Y given
, i.e.,
|
(6)
|
The problem is equivalent to finding the maximum point of
L(
). In the next section we describe how to evaluate
L(
) and its derivatives for efficient optimization.
 |
THE LIKELIHOOD FUNCTION AND ITS DERIVATIVES |
Evaluation of the likelihood and its derivatives is a complicated
process. It can be roughly divided into two major parts. The first is
to reformulate the problem in a way that satisfies the conventional HMM
assumption, i.e., a first-order Markov chain in white noise. The
standard approach for doing this is to introduce a metastate Markov
model that combines the current state of the channel with its history
states into a tuple (Fredkin and Rice, 1992a
;
Venkataramanan et al., 1998a
,
1998b
). Once the HMM
assumption is satisfied, the existing theory can be applied. The other
part is to determine the parameters of the metastate Markov model and their derivatives. The parameters that are optimized are the rate constants, current amplitudes, and noise autocorrelations. Thus we need
to evaluate, for example, the initial probabilities of metastates and
their derivatives with respect to the rate constants, the transition
probabilities of metastates and their derivatives with respect to the
rate constants, the AR coefficients and their derivatives with respect
to the noise autocorrelations, and so on. In the following, we describe
each step in detail. At the end, we also discuss an alternative
definition for metastates along with other strategies that can be used
to improve the computational efficiency.
Metastate Markov model
The standard HMM places a restrictive assumption on both the
signal and noise. The hidden signal needs to be first-order Markovian, while the noise must be white. We have shown in the previous section that the correlation of noise can be removed by pre-whitening the data.
The resulting data, however, still don't satisfy the HMM assumption
due to the history of the underlying signal, as implied by Eq. 4. One
solution to the problem is to consider the current state of the channel
along with its history states in the memory as a group. If the system
has a memory of p lags, we simply lump together the current
state st with the previous p
1 states st
1, ... ,
st
p. This group, when considered as a new
process, is memoryless. Mathematically, this is equivalent to defining
a vector process
|
(7)
|
where each component represents a state of the channel. Taking the
vector process st as the underlying signal, the observation Yt becomes dependent only on the
most current st at any time and independent of
its histories, thereby satisfying the assumption of standard HMM.
The vector process st is Markovian because its
transition at each time involves only a single transition of its first component, st, which is Markovian. The states of
st, called metastates, are the
(p + 1)-tuples
|
(8)
|
where each component is a possible state of the channel. If the
channel has N states, st will have
Np+1 metastates. While the number of metastates
may be large, many transitions among them are disallowed. For two
metastates I = (i0 ... ip)
and J = (j0 ... jp), the
transition between them is allowed if and only if
|
(9)
|
i.e., J corresponds to a shift of I toward
the right by one component.
Before proceeding, we introduce some standard notation that will be
used throughout the paper. As already mentioned above, we use capital
I and J to denote metastates. The capital
I is also used for channel current amplitudes, but its exact
meaning should be clear from the context. The component states of a
metastate are represented by the small i's or
j's, as in Eqs. 8 and 9, and the corresponding conductance
class of these states is designated using the Greek letter µ.
Evaluation of the likelihood
Considering the vector process st as the
hidden signal, we have a new first-order Markov process in white noise,
as seen from Eq. 4. The standard forward-backward procedure can then be
applied to evaluate the likelihood function. The forward and backward
variables are defined as the partial likelihood of the observation
samples with a given metastate for the hidden signal, i.e.,
where t = 1 ... T, and I and
J span over all possible metastates. Let
bI(t) be the probability distribution
of the observation at time t given the underlying process
st in metastate I. Strictly speaking,
bI(t) also depends on the previous
observations Yt
1, Yt
2
... Yt
m in addition to the metastate I, but for simplicity of notation we will not exploit the
dependence explicitly. From Eq. 4 it can be formulated as
|
(10)
|
where
I(t) can be considered as the
noise residue given by
|
(11)
|
and µj is the conductance class of the
jth component of metastate I. The forward and
backward variables are calculated recursively by
|
(12)
|
|
(13)
|
where aIJ is the transition probability
between metastate I and metastate J. The forward
recursion in Eq. 12 proceeds from the first sample at t = 1 to the last sample at t = T, and the backward
recursion goes in the opposite direction. Initially, the forward
recursion starts with the initial probability of the metastates, i.e.,
0(I) =
I, and the
backward recursion begins with
T(J) = 1
for all J.
By definition, the likelihood function is equal to the forward
variables summed over all metastates at time T, i.e.,
|
(14)
|
Therefore, the likelihood can be calculated at the end of the
forward recursion. The backward variables, although not required for
the evaluation of the likelihood, will be needed for the calculation of
its derivatives. In practice, the likelihood itself is usually out of
the machine range and only its logarithm can be computed. This is done
through appropriate scaling of the forward and backward variables. For
details about scaling see, for example, Rabiner (1989)
.
Derivatives of the likelihood
The derivatives of the likelihood function are calculated in
multiple steps. First, we derive the derivatives of the likelihood with
respect to the starting probability and transition probability of the
metastate Markov model, and the derivatives with respect to the channel
current amplitudes and noise AR coefficients. The derivation for these
derivatives is similar to that for the case of white noise (Qin
et al., 2000
), so we will not go through the detailed algebra.
The results can be summarized as
|
(15)
|
|
(16)
|
|
(17)
|
where x in the last equation could be any variable that
appears in the distribution function
bI(t). For the problem here, x is either the current amplitude or an AR coefficient, but
it can be other parameters as well. For example, one may include deterministic perturbations such as baseline drift or harmonic interference into the model. In these cases, the distribution function
bI(t), which is the only part of the
algorithm that needs to be modified, contains the additional
perturbation parameters. The same formula can then be applied to derive
the derivatives of the likelihood function with respect to those
perturbation parameters.
Substituting bI(t) by
its definition and letting x be the current amplitude and AR
coefficient, we can further derive from Eq. 17 the derivatives of the
likelihood function with respect to these variables as the following:
|
(18)
|
|
(19)
|
|
(20)
|
where the coefficient cj(i),
j = 0 ... p in the first equation represents the
convolution of the filter impulse response
{hj} with the AR coefficient
{aj(i)}, as defined earlier. Note that
there are a set of these equations for each conductance level, since
each conductance has its own noise model. From the equations we see
that the derivatives for the current amplitude are summed not only over
all metastates but also over the individual components of each
metastate, while the derivatives for the noise are summed only through
the metastates. Such a discrepancy is as expected because the current
amplitude information is contained in all samples in the memory, while
the noise model, by definition, is only dependent on the first
component of the metastate. Also note that setting the AR coefficient
aj(i) to zero and the filter impulse
response to a delta function reduces the above equations to be the same
as those in the case of white noise.
The derivatives given above constitute only a part of the calculation
for the final derivatives of the likelihood function. Except for the
current amplitudes, they are not given with respect to the model
parameters that are expected to be optimized. For example, both the
starting probabilities and transition probabilities of the metastate
Markov model are functions of the rate constants, while the AR
coefficients are functions of the noise autocorrelations. The next step
in the calculation is to evaluate these intermediate parameters and
their derivatives with respect to the true model parameters, i.e., the
rate constants and autocorrelations. The final derivatives of the
likelihood function are then obtained by combining them using the
differentiation chain rule.
Transition probability of metastates
We consider the calculation of the transition probability of the
metastate Markov model and its derivatives with respect to the rate
constants. To this end we need to first calculate the transition
probability of the channel itself. This can be done using the same
procedure that we have developed for the standard HMM (Qin et
al., 2000
). The calculation is based on the spectral expansion
of the rate constant matrix Q
|
(21)
|
where
i is the ith eigenvalue of
Q and Ai is the product of the
corresponding left and right eigenvectors. Making use of the spectral
expansion, the transition probability of the channel, which is a matrix
exponential of the rate constant matrix as given in Eq. 1, can be
represented by
|
(22)
|
The derivatives of the transition probability with respect to the
rate constants can be formulated as
|
(23)
|
where f(
i,
j,
t) is a scalar function defined by
|
(24)
|
The derivative
Q/
qkl in Eq. 23
is simply a constant matrix, in which all elements are equal to zero
except the (k, l)th entry, which is plus one, and the
kth diagonal element, which is minus one. Because most
elements of the matrix are zero, the matrix product in the summation in
Eq. 23 can be calculated efficiently.
Once we have the transition probability of the channel, the transition
probability of the metastate Markov model along with its derivatives is
readily obtainable. In particular, Eq. 9 specifies the condition for an
allowable transition, and when the condition is satisfied, the
corresponding transition probability aIJ is equal to the transition probability of the channel from state i0 to state j0, where
i0 and j0 are the leading
components of the two metastates, respectively. The derivative of the
metastate transition probability aIJ with
respect to any transition probability aij of the
channel is either 0 or 1, depending on whether the transition is
allowed and whether i and j are equal to
i0 and j0, respectively.
Initial probability of metastates
By definition, a metastate defines which state the channel is in
from the current sampling time through the previous histories in the
memory, so the starting probability of a metastate I = [i0, i1 ... ip] is
essentially the probability for the channel being in state
ip initially and then in state
ik at the subsequent sampling times for
t = k
t, k = 1 ... p. As with the calculation of the transition probabilities of metastates, we need to first calculate the starting probability of the channel itself in order to
determine starting probabilities of the metastates.
The starting probability of the channel is chosen as the equilibrium
probability at the holding condition. It is given by
|
(25)
|
where Qh is the rate constant matrix at the
holding condition. In general, the rate constants at the holding
condition are related to the rate constants at the activating
condition, and they can be reformulated in a way to share a common set
of independent parameters. This point is discussed further under the
topic of likelihood maximization. Thus, when Q is optimized, Qh will follow the change, and as a result, the
starting probability of the channel is also optimized.
Equation 25 is homogeneous and needs to be solved by subjecting it to
the probability totality constraint. The singular value decomposition
technique can be applied for finding the solution (Press et al.,
1992
). It is possible that the model may become reducible at
some holding conditions, in which case only the probabilities of the
absorbing states are solved and others are fixed to zero. From Eq. 25
we can derive the derivatives of the initial probability as
|
(26)
|
where x could be any variable of interest. Notice that
Eqs. 25 and 26 share a common coefficient matrix. Therefore, they can be solved together. One difference to note, however, is the constraint. For solving Eq. 26, namely, the sum of the unknowns is constrained to
zero instead of one.
Given the starting probability of the channel and its subsequent
transition probabilities, the starting probability of a metastate I = [i0, i1 ... ip] can be determined by
|
(27)
|
i.e., the probability of the channel entering state
ip at t = 1, multiplied the
probability of a transition from state ip to
state ip
1 at t = 2, and then
multiplied by the probabilities of the subsequent transitions
until t = p + 1.
To derive the derivatives of the starting probabilities of metastates,
we introduce two auxiliary quantities uk,
0
k
p and vl,
0
l
p where
|
(28)
|
|
(29)
|
with u0 =
ip and
vp = 1. That is, uk
represents the forward partial product in Eq. 27 and
vk the backward one. The derivative of
I with respect to any variable x can then be
written as
|
(30)
|
where the derivatives of the channel starting probability and
transition probability are given by Eqs. 26 and 23, respectively. Depending on the size of a metastate, i.e., the length of the filter
plus the AR order, the calculation of uk and
vk above may require appropriate scaling to
prevent underflow. The scaling can be done following the same strategy
used in the calculation of the likelihood function.
AR coefficients
The AR coefficients are determined from the autocorrelation
variables according to the Yule-Waker equation (5). The equation can be
solved efficiently using the Levinson-Durbin algorithm (Blahut,
1985
). We now give a brief description of the algorithm and
then show how to modify the algorithm to also calculate the derivatives
of the AR coefficients. For clarity, we suppress the sub or
superscripts for the conductance class. Let r0,
r1 ... rp be the autocorrelations of
the noise at a certain conductance. The algorithm calculates the
coefficients of all AR models with orders up to p, denoted
by {a11,
12},
{a21, a22,
22} ... {ap1, ap2, ... ,
app,
p2}. The final set at
order p is the desired solution. Starting with
02 = r0, the algorithm
proceeds recursively for k = 1, 2 ... p as below:
|
(31)
|
|
(32)
|
|
(33)
|
that is, it first computes the leading coefficient
akk, then uses it to compute the remaining
akj values for j = 1 ... k
1, and finally the variance
k2. The recursion
terminates when the desired order is reached. The algorithm implicitly
assumes that |akk|
1 for all k
in order to carry out the recursion for
k2. This is
true for a strict AR process or when the autocorrelation matrix is
positive definite. For an arbitrary process, the assumption may fail,
in which case the algorithm should be terminated when
k2 becomes sufficiently small.
To calculate the derivatives of the AR coefficients, we differentiate
the above recursive equations, leading to
|
(34)
|
|
(35)
|
|
(36)
|
where x represents an autocorrelation variable. The
equations suggest that we can calculate the derivatives of the AR
coefficients basically in the same way that the coefficients themselves
are calculated. That is, the calculation is carried out recursively through the order, and the derivatives at the kth order are
calculated from those at (k
1)th order. At each
stage, the same recursions need to be repeated for each variable
x. Because the original Levinson-Durbin recursion has a
complexity on the order of p2, the calculation
of the derivatives for all variables will take on the order of
p3 operations, which is about the same as the
complexity of a general linear equation solver. In practice, the time
needed for the computation of the AR coefficients and their derivatives
is negligible compared to the evaluation of the likelihood, because the
AR order is usually small.
Computational complexity
Although the calculation of the likelihood function and its
derivatives involves many steps, the most time-consuming part is the
calculation of the forward and backward variables and the calculation
of the derivatives of the likelihood with respect to the transition
probabilities. Each of these steps takes on the order of
N2(p+1)T operations, where
N is the number of states of the channel and p is
equal to the order of the noise AR model plus the filter length. The
time for the rest of the calculations is negligible. Therefore, the
overall computational complexity for the calculation of the likelihood
and its derivatives is on the order of
N2(p+1)T. Such a complexity increases
exponentially with the filter length or the AR order, which is the
major limiting factor for the practical applicability of the algorithm.
There are several maneuvers that can be used to cut down the
computation. One is to take advantage of the aggregation property of
the channel in the definition of the metastate. Specifically, we can
use the state only for the first component of a metastate while
defining others to be conductance class. That is
|
(37)
|
where xt represents the conductance class
of the state sequence st. Such a new vector
process remains Markovian because its transition is fully determined by
the first component. The state space of the new process, however, is
much smaller. If the channel has N states and M
conductances, the number of metastates using this definition will be
NMp as opposed to Np+1
with the old definition. Because channels often have only two conductances, such a reduction is significant, especially considering that the computation is quadratic on the number of metastates. In the
implementation of our algorithm we have exploited this definition. The
reason that we have used the other definition in the above is mainly
for clarity of the presentation.
Another operation that may speed up the algorithm is to make use of the
fact that many transitions between metastates are disallowed. As a
consequence, the summation in the forward and backward recursions does
not need to be extended through all possible metastates. Instead, we
can restrict it only to the allowable transitions. As stated above, a
transition between two metastates is allowed if and only if they
satisfy Eq. 9. Therefore, there are only N allowable
transitions for each metastate. The summation in the forward and
backward recursions needs to be performed for only N times
instead of NMp. This is true for every metastate
at each time t, thus leading to a total reduction in the
computation by a factor of Mp, which is
considerable as p becomes large.
 |
MAXIMIZATION OF THE LIKELIHOOD |
We have shown how to evaluate the likelihood and its derivatives
for a given particular set of model parameters. The next step is to
optimize the model to search for the maximum of the likelihood
function. This is essentially an optimization problem and can be solved
in a variety of ways. We have used the same approach that we have
developed for the maximization of the likelihood with the standard HMM
(Qin et al., 2000
). The following is a brief outline of
some major features of the approach.
The search of the likelihood space is performed using a quasi-Newton
method (Fletcher, 1980
,
1981
). It is based on successive approximation of the likelihood surface with parabolas and uses an
approximate inverse Hessian matrix built from the first-order partial
derivatives. The main advantage of the method is the quadratic convergence near the maximum point because the likelihood surface there
can be well-approximated by a parabola. It also avoids the need for
exact line minimization. An approximate line minimization usually takes
one to two function evaluations, but an accurate line minimization may
require repetitive braking, and therefore many function calls. When the
evaluation of the objective function is computationally costly, this
may significantly degrade the overall efficiency of the algorithm.
Another feature of the method is that it results in an estimate of the
inverse Hessian matrix, which can be used to derive the standard errors
of the parameter estimates.
Constraints on the model parameters are allowed. The constraints on
rate constants include holding rates at fixed values, linear scaling
between two rates, detailed balance, and so on. The current amplitudes
can be linearly scaled or fixed. The noise can be constrained too, for
example, to restrict the excess open noise to be white or the noise at
different conductance to have the same amount of correlation. These
constraints are either linear or can be converted to be linear.
Therefore, they can be handled analytically by formulating the
constrained variables into linear combinations of a set of
unconstrained ones. As a result, the originally constrained problem is
reduced to an equivalent unconstrained one, making the optimization
more efficient.
The approach also allows multiple data sets obtained at different
experimental conditions to be fit simultaneously. The feature not only
allows more data to be used for modeling, but also becomes required for
resolving models that degenerate at individual conditions. In our
implementation, the rates in the model are represented explicitly as a
function of the experimental condition such as ligand concentration,
voltage, or force. The generic rate constants that are independent of
experimental conditions are chosen as the variables to optimize. Such a
parametrization of the rates also makes it possible to specify the
initial probability of the channel by holding experimental conditions,
which is particularly useful for channels not under equilibrium.
 |
EXTENSION TO MULTIPLE CHANNELS |
The expression of ion channels in membranes is often high, making
the recording of currents from an individual channel difficult. In many
cases one can only obtain recordings arising from multiple channels.
The algorithm described above can be extended readily for the analysis
of such data.
The kinetics of multi-channel activity remains Markovian provided that
the individual channels act independently. Consider a patch containing
K channels. Let st(i) denote the
state sequence of the ith channel at time t. The
configuration of the system at any time can be characterized by a
K-tuple
|
(38)
|
which can be considered as a new Markov process. Because each
component of the tuple has a finite number of states, the tuple itself
also has a finite number of states. Specifically, if
st(i) has Ni states,
the tuple has N1N2 ... NK states. The first-order transitions between two
states occur if and only if there is one channel in the patch making a
transition and the corresponding rate constant is the same as that of
the underlying channel.
For identical channels, Horn and Lang (1983)
suggest a
more efficient definition, where the tuple has the form
|
(39)
|
where nt(i) is the number of channels
in state i at time t. Such a definition takes the
advantage of the indistinguishability of the composition channels and
therefore results in a smaller number of tuple states. For example, if
the channel has N states, nt has only
instead of
NK states. The condition for an allowable
transition between two tuple states stays the same, i.e., only one
channel in the patch undergoes a transition at the same time. The
transition rate, however, is different and needs to be multiplied by
the number of the channels at the starting state of the transition. For
a detailed description of the two approaches, see Qin et al. (1996)
.
The observations for the tuple process can also be parametrized in
terms of those for the single channels. For illustration, we consider a
patch containing identical channels and each channel has three
conductance classes, namely closed (C), partially open (S), and fully open (O). Suppose there is a tuple
with nC channels closed,
nS channels partially open, and
nO channels fully open. The current amplitude of
the tuple state is given by
|
(40)
|
where IC, IS, and
IO are the current amplitudes of the single
channel. The first term is the current at the baseline, the second is
contributed by the channels at the substates, and the third due to the
channels at the fully open states. Similarly, the noise
autocorrelations corresponding to the tuple state are given
by
|
(41)
|
where {rj(C)},
{rj(S)} and
{rj(O)} are the autocorrelations of the
noise associated with single channels at the three conductance levels.
From this example it is seen that both the current amplitudes and noise
autocorrelations for the tuple states are linear combinations of the
singles. This is generally true and applies to both identical and
nonidentical channels. It should be mentioned that such linear relationships are retained because of the use of autocorrelations to
parametrize the noise. If the AR coefficients were used, the relation
would be more complicated and generally cannot be specified analytically because the AR functions are not closed under the additive operations.
It is worth noting that the definition of a conductance class is
sometimes ambiguous when multiple channels are present. For a single
channel, the states are classified in terms of their observable current
amplitudes, and the states with the same current amplitude are assumed
to have the same noise properties. But for multiple channels, two tuple
states may have the same current amplitude but different noise
characteristics. Therefore, the current amplitude alone is not enough
for classification. There are several solutions to the problem. One is
to treat each state as a single class. The disadvantage of doing so is
that it may result in an excessive number of metastates, because the
redundancy between the states is not taken into account. A more
efficient approach, which we have used in our implementation, is to use the number of channels at each conductance as a criterion. For the
above example, the criterion would be the tuple
(nC, nS,
nO) where nC,
nS, and nO are the number
of channels in the patch at the three conductances, respectively. All
tuple states with the same index are guaranteed to have the same
current amplitude and noise property.
With the tuple process defined in Eq. 38 or 39 as the underlying Markov
process, the multiple channel data can be fitted in the same way as the
singles. The only difference is the constraints, since the parameters
for the tuple process are not all independent. Instead, they are linear
combinations of the singles, as shown above. To minimize the degrees of
freedom, we have chosen to optimize the single channel variables
directly. Fortunately, all constraints, including those for the noise,
are linear, and therefore can be handled easily using the same approach
described in the previous section.
 |
EXPERIMENTAL RESULTS |
We present a few examples to show some basic features of the HMM
approach described in the previous sections. A set of four examples is
given. The first example illustrates the general applicability of the
algorithm to the typical correlated patch-clamp background noise. In
the second example, we introduce excess open noise, both white and
correlated, in addition to the correlated background noise. The third
example tests the algorithm with filtered data. The last example
demonstrates the applicability of the algorithm to data containing
multiple channels.
The algorithm is implemented in a general way to allow for both single
channels and multiple channels, with either white or correlated noise.
The input to the program is a single channel model along with the
number of channels. Internally, the program first builds a
multi-channel Markov model for the tuple process, as defined above. In
the case of a single channel, this multi-channel model reduces to the
single one, but in the case of multiple channels, a set of linear
constraints are set up automatically on the rate constants, current
amplitudes, and noise autocorrelations. Following the multi-channel
Markov model, the algorithm then builds a metastate Markov model based
on the input filtering and noise correlation. The noise at different
conductance levels can be constrained to be either the same or to
differ by an excess variance that can be either white or correlated.
Colored background noise
We first consider an example with simulated patch-clamp noise. The
power spectrum of the background noise in a typical patch-clamp experiment can be described by (Sigworth, 1995
)
|
(42)
|
over the experimentally accessible frequency range from dc to 100 kHz. The frequency-independent component is mainly due to the shot
noise of resistors in the amplifier. The
f2-dependent component arises from
current fluctuations resulting from the amplifier voltage noise imposed
on the pipette and amplifier stray capacitance. In discrete time,
Venkataramanan et al. (1998a)
show that the noise can be
approximated by a first-order moving average (MA) process
|
(43)
|
where w(t) is white, Gaussian noise with zero mean and
variance
w2. Typical values for coefficients are
m0 = 0.8 and m2 =
0.6, obtained by fitting a representative patch-clamp recording
without channel activity.
Although the noise is a simple first-order MA process, it requires a
more complicated AR model for a good approximation. A first-order MA
process has a correlation extending over only one sample point, but a
first-order AR process usually correlates over an infinite number of
samples. Fig. 1 shows the power spectrum of the first-order MA model given in Eq. 43 in comparison with its AR
approximations with various orders. From the figure we see that a
first-order AR doesn't have a sufficient capacity to model the noise.
But as the order increases, the approximation becomes more accurate. In
theory, the approximation can be made into an arbitrary accuracy with a
sufficiently large order. In practice, however, the exponentially
increasing computation burden limits the use of very high-order models.

View larger version (18K):
[in this window]
[in a new window]
|
FIGURE 1
Power spectrum density of a first-order MA process in
comparison with its AR approximations at different orders. The AR model
is determined by solving the Yule-Walker equation. The approximation
improves as the order increases.
|
|
The data were simulated based on a two state model: