help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Qin, F.
Right arrow Articles by Sachs, F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Qin, F.
Right arrow Articles by Sachs, F.

Biophys J, October 2000, p. 1928-1944, Vol. 79, No. 4

Hidden Markov Modeling for Single Channel Kinetics with Filtering and Correlated Noise

Feng Qin, Anthony Auerbach, and Frederick Sachs

Department of Physiology and Biophysical Sciences, State University of New York at Buffalo, Buffalo, New York 14214 USA




    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

Hidden Markov modeling (HMM) can be applied to extract single channel kinetics at signal-to-noise ratios that are too low for conventional analysis. There are two general HMM approaches: traditional Baum's reestimation and direct optimization. The optimization approach has the advantage that it optimizes the rate constants directly. This allows setting constraints on the rate constants, fitting multiple data sets across different experimental conditions, and handling nonstationary channels where the starting probability of the channel depends on the unknown kinetics. We present here an extension of this approach that addresses the additional issues of low-pass filtering and correlated noise. The filtering is modeled using a finite impulse response (FIR) filter applied to the underlying signal, and the noise correlation is accounted for using an autoregressive (AR) process. In addition to correlated background noise, the algorithm allows for excess open channel noise that can be white or correlated. To maximize the efficiency of the algorithm, we derive the analytical derivatives of the likelihood function with respect to all unknown model parameters. The search of the likelihood space is performed using a variable metric method. Extension of the algorithm to data containing multiple channels is described. Examples are presented that demonstrate the applicability and effectiveness of the algorithm. Practical issues such as the selection of appropriate noise AR orders are also discussed through examples.



    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

Hidden Markov modeling (HMM) provides an efficient approach for analysis of single channel currents. It is particularly useful for records where the signal-to-noise ratio is low or the channel kinetics is rapid. The conventional dwell-time approach (Colquhoun and Sigworth, 1995; Magleby and Weiss, 1990a, 1990b; Horn and Lange, 1983; Chay, 1988; Ball and Sansom, 1989; Qin et al., 1996, 1997) fails in these cases because an appropriate idealization of such data is often difficult. The HMM approach analyzes the noisy data directly, thereby eliminating the necessity of idealization. As a result, it has an improved requirement on signal-to-noise ratio and has the potential to allow for more rapid channel kinetics. In this approach, each sample point is treated as a relevant interval and the probability of obtaining the sequence of the observed data samples is calculated according to a model. The model contains two terms: an amplitude term that is used to estimate the probability that a given data point belongs to a given current level, and a transition probability term that is used to estimate the probability of a state transition between data points. The amplitude term is usually described by Gaussian distributions centered about each current level. The transition probability is described by a matrix exponential composed of the model rate constants and describes the probability of switching states between adjacent data points. The product of the probabilities over all possible paths can be computed recursively, producing the likelihood of observing the data given the model. The likelihood is then optimized with respect to the parameters of the model to produce the best fit. Because the noise is taken into account explicitly, the HMM approach permits a relatively low signal-to-noise ratio. It is also less prone to errors due to missed events because the transition probability takes into account the undetected transitions between adjacent samples. The penalty for using the sample-based likelihood paradigm is computation time, as the probabilities have to be evaluated at every data point.

Baum's reestimation, a precursor to the general expectation-maximization (EM) approach for maximal likelihood estimation, is the standard approach for estimating the parameters in a hidden Markov model. When applied to single channel analysis, the method, however, has several drawbacks. First, it assumes a perfect Markov signal with white background noise. For experiments like patch-clamp recording, this assumption is usually too restrictive. The data are always low-pass filtered. Furthermore, the instrument noise has a power spectrum that increases quadratically with frequency, and when the channel is open there is often additional noise arising from a variety of sources, and usually has unknown spectral characteristics. For channels with slow kinetics, the effects of such distortions may be negligible, but when the channel activity is busy, they become significant and need to be addressed explicitly. Another problem is that Baum's algorithm estimates discrete transition probabilities. The channel kinetics, however, is a continuous process and the quantities of interest are the rate constants. Although it is possible to convert the transition probabilities into rate constants, the model topology generally cannot be retained. Finally, in Baum's algorithm there is no explicit control of the rate constants, thus it is difficult to impose constraints such as detailed balance, and to fit multiple data sets from different experimental conditions simultaneously.

The issue of correlated noise has recently been addressed by extending Baum's reestimation formulae. Walsh and Sigworth (1992) took the approach of preprocessing the data to pre-whiten the noise, thereby reducing the problem to solving a high-order Markov process in white noise; this, in turn, was reformulated into a first-order Markov process with an enlarged state space. More recently, Venkataramanan et al. (1998a, 1998b) extended this approach to account for additional noise in the open states and to estimate the noise correlations directly. To enable the direct optimization of rate constants, Fredkin and Rice (1992b) have attempted to use standard optimizers to maximize the likelihood function. Their approach, however, is reported to be very computationally intensive.

We have recently developed a direct optimization approach to hidden Markov modeling of single-channel currents. The approach has two essential features, i.e., the direct optimization of rate constants and the use of analytical derivatives for optimization of the likelihood function. These in turn provide the algorithm with several desirable features, such as the capability for explicit control of model topology, the allowance for imposition of constraints on rate constants, and the flexibility of simultaneous fitting of multiple data sets on different experimental conditions. The availability of the analytical derivatives of the likelihood function makes it possible to use the efficient gradient-based optimizers such as the variable metric method to search the likelihood surface. Compared with Baum's reestimation, the method has a favorable convergence near the maximum, especially in the extreme cases of low signal-to-noise ratio and aggregated kinetics. Under these conditions, Baum's algorithm often exhibits a slow convergence in its approach to the maximum by taking very small steps.

In this paper we extend this approach to address the issues of band-limited signals and correlated noise. Following Venkataramanan et al. (1998a, 1998b), we model the noise by an autoregressive (AR) process, so that the data can be reduced to a higher-order Markov process in white noise. However, we exploit a different strategy to parametrize the noise. Instead of estimating the AR coefficients, we choose to optimize the autocorrelations of the noise. Such a choice was motivated by the fact that the autocorrelations of two additive noises are linear combinations of the individual ones. Thus the relationship for the noise at different conductance levels, which is often additive, can be taken into account explicitly through the use of linear constraints. This feature is particularly useful for modeling multiple channel activity where the noise at different conductance levels is highly related. Other issues, such as the correction for low-pass filtering, the optimization of initial probabilities, and the handling of multiple channels, are also addressed. Finally, several prototype examples are provided to illustrate the features and performance of the algorithm.



    THE MODEL
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

We consider a channel with N conformation states partitioned into M conductance classes. Let Ii, i = 1 ... M denote the current amplitude at each conductance. Transitions among the states are described by a time-homogeneous Markov process with an infinitesimal generator matrix Q = [qij]N×N, where the (ij)th off-diagonal element qij represents the transition rate from state i to state j, and the diagonal elements are defined so that each row sums to zero. The system can be either in equilibrium or transient following perturbation. Although not necessary, the model is usually irreducible with states reachable from each other.

In practice we observe a discrete sampling of the continuous process. A sampled Markov process can be considered a Markov chain whose transitions are described by a transition probability matrix, say A = [aij], where aij represents the probability of the channel being in state j at the next sampling time given that it is in state i at the current sampling time. For a given sampling interval Delta t, A is related to Q by
<B><UP>A</UP></B>=<UP>exp</UP>(<B><UP>Q</UP></B>&Dgr;t). (1)
The equation mathematically defines a one-to-one mapping between Q and A. However, it is worth noting that the inversion from a given A to Q may not always produce physically meaningful results because some Markov chains cannot be considered to be sampled from a continuous Markov process (Qin et al., 2000).

The noise in the data is modeled conductance-wise. That is, all states with the same mean conductance are assumed to have the same noise characteristics, but the noise for different conductances is allowed to be different. For each conductance, the associated noise is modeled using an autoregressive (AR) process to account for its color. Let n(i)(t) denote the noise at the ith conductance, and sigma i and aj(i), 1 <=  j <=  m be the corresponding AR coefficients. For simplicity, we assume the same order for all AR models at different conductances. An AR process can be considered as the output of passing white noise through an all-pole filter, i.e.,
n<SUP>(<UP>i</UP>)</SUP><SUB><UP>t</UP></SUB>+a<SUP>(<UP>i</UP>)</SUP><SUB><UP>1</UP></SUB>n<SUP>(<UP>i</UP>)</SUP><SUB><UP>t−1</UP></SUB>+… a<SUP>(<UP>i</UP>)</SUP><SUB><UP>m</UP></SUB>n<SUP>(<UP>i</UP>)</SUP><SUB><UP>t−m</UP></SUB>=&sfgr;<SUB><UP>i</UP></SUB>w<SUB><UP>t</UP></SUB> (2)
where wt is white noise with unit variance. From the functional approximation point of view, an AR model is equivalent to using an all-pole rational function to approximate the power spectrum of the noise. Because there are no zeros, the AR model is more restrictive than the more general ARMA model (Kay and Marple, 1981), but given sufficiently high order, it can fit a wide range of functions with either smooth or wavy features.

The effect of filtering is accounted using a finite impulse response (FIR) filter whose coefficients are denoted by hj, j = 0 ... n - 1. We assume that the filtering is applied only to the underlying signal. In practice it affects noise too, but we leave that effect to be taken care of by the noise model. We also assume in this paper that the coefficients of the filter are known a priori. They can be determined from the transfer function of the recording system and those of the digital filters that may have been used. It should be emphasized that although the model takes filtering into account explicitly, one should always preprocess the data before analysis to correct for bandwidth. This can be done through appropriate inverse filtering and decimation. The filter introduced here is mainly intended for the leftover effect that cannot be corrected a priori. A filter with as few coefficients as possible is important because a long filter, as seen later, will cause the computation time to increase exponentially.

The advantage of using an AR model for noise is that it allows the noise to be pre-whitened. Let st denote the state sequence of the underlying signal and xt be the corresponding conductance class. The observation can be written algebraically as
Y<SUB><UP>t</UP></SUB>=H(z)I<SUB><UP>x</UP><SUB><UP>t</UP></SUB></SUB>+<FR><NU>&sfgr;<SUB><UP>x</UP><SUB><UP>t</UP></SUB></SUB></NU><DE>A<SUP>(<UP>x<SUB>t</SUB></UP>)</SUP>(z)</DE></FR> w<SUB><UP>t</UP></SUB> (3)
where H(z) is the transfer function of the FIR filter and A(i)(z) is the denominator of the transfer function of the AR filter, i.e.,
A<SUP>(<UP>i</UP>)</SUP>(z)≡<LIM><OP>∑</OP><LL><UP>j=0</UP></LL><UL><UP>m</UP></UL></LIM> a<SUP>(<UP>i</UP>)</SUP><SUB><UP>j</UP></SUB>z<SUP><UP>−j</UP></SUP>.
The pre-whitening is done by multiplying A(xt)(z) through Eq. 3, leading to
Y<SUB><UP>t</UP></SUB>=<LIM><OP>∑</OP><LL><UP>j=0</UP></LL><UL><UP>p</UP></UL></LIM> c<SUP>(<UP>x<SUB>t</SUB></UP>)</SUP><SUB><UP>j</UP></SUB>I<SUB><UP>x</UP><SUB><UP>t−j</UP></SUB></SUB>−<LIM><OP>∑</OP><LL><UP>j=1</UP></LL><UL><UP>m</UP></UL></LIM> a<SUP>(<UP>x<SUB>t</SUB></UP>)</SUP><SUB><UP>j</UP></SUB>Y<SUB><UP>t−j</UP></SUB>+&sfgr;<SUB><UP>x</UP><SUB><UP>t</UP></SUB></SUB>w<SUB><UP>t</UP></SUB> (4)
where p triple-bond  m + n - 1 and cj(i), j = 0 ... p are the coefficients of the product H(z)A(xt)(z), or equivalently, the convolution of {hj} and {aj(i)}. The significance of such pre-whitening is that it reduces the noise in the data to be white, as implied by Eq. 4, thereby satisfying the HMM assumption. This is of course achieved at the expense of further distorting the underlying signal. When the AR model is known a priori, the pre-whitening can be done experimentally. But for the problem here, it can only be done in theory because not only is the noise model unknown, but the system is time-variant, i.e., the noise model varies with the channel's conductance.

A potential problem with the AR model is the difficulty of imposing constraints on the noise between different conductance levels. For example, one might need to constrain the open noise so that it is the superimposition of the closed noise with an extra white component. The ability to allow for such constraints becomes particularly useful for modeling multi-channel records, as will be shown later. The AR model, however, is not linearly additive in the sense that the sum of two AR processes is generally no longer an AR process. We circumvent this by reparametrizing the noise using autocorrelations instead of AR coefficients. The autocorrelation function has the desirable feature that the autocorrelations of two independent random processes are linear combinations of the autocorrelations of the individual ones.

The AR coefficients of an mth order AR process are fully determined by its first m autocorrelations. This is stated by the Yule-Walker equation (Kay and Marple, 1981)
<FENCE><AR><R><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>0</UP></SUB></C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>1</UP></SUB></C><C>…</C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>m</UP></SUB></C></R><R><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>1</UP></SUB></C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>0</UP></SUB></C><C>…</C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>m−1</UP></SUB></C></R><R><C>&vtdot;</C><C>&vtdot;</C><C></C><C>&vtdot;</C></R><R><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>m</UP></SUB></C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>m−1</UP></SUB></C><C>&vtdot;</C><C>r<SUP>(<UP>i</UP>)</SUP><SUB><UP>0</UP></SUB></C></R></AR></FENCE><FENCE><AR><R><C>1</C></R><R><C>a<SUP>(<UP>i</UP>)</SUP><SUB><UP>1</UP></SUB></C></R><R><C>&vtdot;</C></R><R><C>a<SUP>(<UP>i</UP>)</SUP><SUB><UP>m</UP></SUB></C></R></AR></FENCE>=<FENCE><AR><R><C>&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB></C></R><R><C>0</C></R><R><C>&vtdot;</C></R><R><C>0</C></R></AR></FENCE> (5)
where rj(i), j = 0 ... m are the autocorrelations of the noise at the ith conductance level. The equation can be solved using either a general linear equation solver or, more efficiently, the Levinson-Durbin algorithm (Blahut, 1985). The latter has a complexity on the order of m2 as opposed to m3. For a process that is not strictly AR, the above equation can still be used to determine the AR parameters. In this case the resulting AR model fits the first m autocorrelations exactly, and the remaining ones are extrapolated based on the maximal entropy criterion.

With the above parametrization, the entire model, denoted by Theta , constitutes of the rate constants (qij), the current amplitudes (Ii), and the noise autocorrelations (rj(i)). The problem is then to estimate all these parameters from the given observations. In the paradigm of HMM, the maximum likelihood approach is used. Let Y = Y1 ··· YT denote the observed samples. The likelihood function, denoted by L(Theta ), is defined as the probability of observing Y given Theta , i.e.,
L(&THgr;)=<UP>Pr</UP>(Y<SUB>1</SUB> … Y<SUB><UP>T</UP></SUB>‖&THgr;). (6)
The problem is equivalent to finding the maximum point of L(Theta ). In the next section we describe how to evaluate L(Theta ) and its derivatives for efficient optimization.



    THE LIKELIHOOD FUNCTION AND ITS DERIVATIVES
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

Evaluation of the likelihood and its derivatives is a complicated process. It can be roughly divided into two major parts. The first is to reformulate the problem in a way that satisfies the conventional HMM assumption, i.e., a first-order Markov chain in white noise. The standard approach for doing this is to introduce a metastate Markov model that combines the current state of the channel with its history states into a tuple (Fredkin and Rice, 1992a; Venkataramanan et al., 1998a, 1998b). Once the HMM assumption is satisfied, the existing theory can be applied. The other part is to determine the parameters of the metastate Markov model and their derivatives. The parameters that are optimized are the rate constants, current amplitudes, and noise autocorrelations. Thus we need to evaluate, for example, the initial probabilities of metastates and their derivatives with respect to the rate constants, the transition probabilities of metastates and their derivatives with respect to the rate constants, the AR coefficients and their derivatives with respect to the noise autocorrelations, and so on. In the following, we describe each step in detail. At the end, we also discuss an alternative definition for metastates along with other strategies that can be used to improve the computational efficiency.

Metastate Markov model

The standard HMM places a restrictive assumption on both the signal and noise. The hidden signal needs to be first-order Markovian, while the noise must be white. We have shown in the previous section that the correlation of noise can be removed by pre-whitening the data. The resulting data, however, still don't satisfy the HMM assumption due to the history of the underlying signal, as implied by Eq. 4. One solution to the problem is to consider the current state of the channel along with its history states in the memory as a group. If the system has a memory of p lags, we simply lump together the current state st with the previous p - 1 states st-1, ... , st-p. This group, when considered as a new process, is memoryless. Mathematically, this is equivalent to defining a vector process
<B><UP>s</UP></B><SUB><UP>t</UP></SUB>=(s<SUB><UP>t</UP></SUB>, s<SUB><UP>t−1</UP></SUB> … s<SUB><UP>t−p</UP></SUB>) (7)
where each component represents a state of the channel. Taking the vector process st as the underlying signal, the observation Yt becomes dependent only on the most current st at any time and independent of its histories, thereby satisfying the assumption of standard HMM.

The vector process st is Markovian because its transition at each time involves only a single transition of its first component, st, which is Markovian. The states of st, called metastates, are the (p + 1)-tuples
(i<SUB>0</SUB>,…, i<SUB><UP>p</UP></SUB>) (8)
where each component is a possible state of the channel. If the channel has N states, st will have Np+1 metastates. While the number of metastates may be large, many transitions among them are disallowed. For two metastates I = (i0 ... ip) and J = (j0 ... jp), the transition between them is allowed if and only if
i<SUB><UP>k</UP></SUB>=j<SUB><UP>k+1</UP></SUB>, k=0 … p−1 (9)
i.e., J corresponds to a shift of I toward the right by one component.

Before proceeding, we introduce some standard notation that will be used throughout the paper. As already mentioned above, we use capital I and J to denote metastates. The capital I is also used for channel current amplitudes, but its exact meaning should be clear from the context. The component states of a metastate are represented by the small i's or j's, as in Eqs. 8 and 9, and the corresponding conductance class of these states is designated using the Greek letter µ.

Evaluation of the likelihood

Considering the vector process st as the hidden signal, we have a new first-order Markov process in white noise, as seen from Eq. 4. The standard forward-backward procedure can then be applied to evaluate the likelihood function. The forward and backward variables are defined as the partial likelihood of the observation samples with a given metastate for the hidden signal, i.e.,
&agr;<SUB><UP>t</UP></SUB>(I)=<UP>Pr</UP>[Y<SUB>1</SUB>…Y<SUB><UP>t</UP></SUB>, <B><UP>s</UP></B><SUB><UP>t</UP></SUB>=I]

&bgr;<SUB><UP>t</UP></SUB>(J)=<UP>Pr</UP>[Y<SUB><UP>t+1</UP></SUB>…Y<SUB><UP>T</UP></SUB>, <B><UP>s</UP></B><SUB><UP>t</UP></SUB>=J]
where t = 1 ... T, and I and J span over all possible metastates. Let bI(t) be the probability distribution of the observation at time t given the underlying process st in metastate I. Strictly speaking, bI(t) also depends on the previous observations Yt-1, Yt-2 ... Yt-m in addition to the metastate I, but for simplicity of notation we will not exploit the dependence explicitly. From Eq. 4 it can be formulated as
b<SUB><UP>I</UP></SUB>(t)=<FR><NU>1</NU><DE><RAD><RCD>2&pgr;</RCD></RAD>&sfgr;<SUB>&mgr;<SUB>0</SUB></SUB></DE></FR> <UP>exp</UP><FENCE><UP>−</UP><FR><NU>&ohgr;<SUP><UP>2</UP></SUP><SUB><UP>I</UP></SUB>(t)</NU><DE>2&sfgr;<SUP>2</SUP><SUB>&mgr;<SUB>0</SUB></SUB></DE></FR></FENCE> (10)
where omega I(t) can be considered as the noise residue given by
&ohgr;<SUB><UP>I</UP></SUB>(t)=<LIM><OP>∑</OP><LL><UP>j=1</UP></LL><UL><UP>m</UP></UL></LIM> a<SUP>(<UP>&mgr;<SUB>0</SUB></UP>)</SUP><SUB><UP>j</UP></SUB>Y<SUB><UP>t−j</UP></SUB>−<LIM><OP>∑</OP><LL><UP>j=0</UP></LL><UL><UP>p</UP></UL></LIM> c<SUP>(<UP>&mgr;<SUB>0</SUB></UP>)</SUP><SUB><UP>j</UP></SUB>I<SUB><UP>&mgr;</UP><SUB><UP>j</UP></SUB></SUB> (11)
and µj is the conductance class of the jth component of metastate I. The forward and backward variables are calculated recursively by
&agr;<SUB><UP>t+1</UP></SUB>(J)=<LIM><OP>∑</OP><LL><UP>I</UP></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)a<SUB><UP>IJ</UP></SUB>b<SUB><UP>J</UP></SUB>(t+1) (12)

&bgr;<SUB><UP>t</UP></SUB>(I)=<LIM><OP>∑</OP><LL><UP>J</UP></LL></LIM> a<SUB><UP>IJ</UP></SUB>&bgr;<SUB><UP>t+1</UP></SUB>(J)b<SUB><UP>J</UP></SUB>(t+1) (13)
where aIJ is the transition probability between metastate I and metastate J. The forward recursion in Eq. 12 proceeds from the first sample at t = 1 to the last sample at t = T, and the backward recursion goes in the opposite direction. Initially, the forward recursion starts with the initial probability of the metastates, i.e., alpha 0(I) = pi I, and the backward recursion begins with beta T(J) = 1 for all J.

By definition, the likelihood function is equal to the forward variables summed over all metastates at time T, i.e.,
L(&THgr;)=<LIM><OP>∑</OP><LL><UP>I</UP></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I). (14)
Therefore, the likelihood can be calculated at the end of the forward recursion. The backward variables, although not required for the evaluation of the likelihood, will be needed for the calculation of its derivatives. In practice, the likelihood itself is usually out of the machine range and only its logarithm can be computed. This is done through appropriate scaling of the forward and backward variables. For details about scaling see, for example, Rabiner (1989).

Derivatives of the likelihood

The derivatives of the likelihood function are calculated in multiple steps. First, we derive the derivatives of the likelihood with respect to the starting probability and transition probability of the metastate Markov model, and the derivatives with respect to the channel current amplitudes and noise AR coefficients. The derivation for these derivatives is similar to that for the case of white noise (Qin et al., 2000), so we will not go through the detailed algebra. The results can be summarized as
<FR><NU>∂L</NU><DE>∂&pgr;<SUB><UP>I</UP></SUB></DE></FR>=&bgr;<SUB>1</SUB>(I)b<SUB><UP>I</UP></SUB>(1) (15)

<FR><NU>∂L</NU><DE>∂a<SUB><UP>IJ</UP></SUB></DE></FR>=<LIM><OP>∑</OP><LL><UP>t</UP></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)&bgr;<SUB><UP>t+1</UP></SUB>(J)b<SUB><UP>J</UP></SUB>(t+1) (16)

<FR><NU>∂L</NU><DE>∂x</DE></FR>=<LIM><OP>∑</OP><LL><UP>t</UP></LL></LIM> <LIM><OP>∑</OP><LL><UP>I</UP></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)&bgr;<SUB><UP>t</UP></SUB>(I) <FR><NU>∂ <UP>ln</UP> b<SUB><UP>I</UP></SUB>(t)</NU><DE>∂x</DE></FR> (17)
where x in the last equation could be any variable that appears in the distribution function bI(t). For the problem here, x is either the current amplitude or an AR coefficient, but it can be other parameters as well. For example, one may include deterministic perturbations such as baseline drift or harmonic interference into the model. In these cases, the distribution function bI(t), which is the only part of the algorithm that needs to be modified, contains the additional perturbation parameters. The same formula can then be applied to derive the derivatives of the likelihood function with respect to those perturbation parameters.

Substituting bI(t) by its definition and letting x be the current amplitude and AR coefficient, we can further derive from Eq. 17 the derivatives of the likelihood function with respect to these variables as the following:
<FR><NU>∂L</NU><DE>∂I<SUB><UP>k</UP></SUB></DE></FR>=<LIM><OP>∑</OP><LL><UP>t</UP></LL></LIM> <LIM><OP>∑</OP><LL><UP>j</UP></LL></LIM> <LIM><OP>∑</OP><LL><AR><R><C><UP>I</UP></C></R><R><C><UP>&mgr;<SUB>j</SUB>=k</UP></C></R></AR></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)&bgr;<SUB><UP>t</UP></SUB>(I)&ohgr;<SUB><UP>I</UP></SUB>(t)&sfgr;<SUP><UP>−2</UP></SUP><SUB>&mgr;<SUB>0</SUB></SUB>c<SUP>(<UP>&mgr;<SUB>0</SUB></UP>)</SUP><SUB><UP>j</UP></SUB> (18)

<FR><NU>∂L</NU><DE>∂&sfgr;<SUB><UP>k</UP></SUB></DE></FR>=<UP>−</UP><LIM><OP>∑</OP><LL><UP>t</UP></LL></LIM> <LIM><OP>∑</OP><LL><AR><R><C><UP>I</UP></C></R><R><C><UP>&mgr;<SUB>0</SUB>=k</UP></C></R></AR></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)&bgr;<SUB><UP>t</UP></SUB>(I)&sfgr;<SUP><UP>−3</UP></SUP><SUB>&mgr;<SUB>0</SUB></SUB>[&sfgr;<SUP>2</SUP><SUB>&mgr;<SUB>0</SUB></SUB>−&ohgr;<SUP><UP>2</UP></SUP><SUB><UP>I</UP></SUB>(t)] (19)

<FR><NU>∂L</NU><DE>∂a<SUP>(<UP>k</UP>)</SUP><SUB><UP>i</UP></SUB></DE></FR>=<UP>−</UP><LIM><OP>∑</OP><LL><UP>t</UP></LL></LIM> <LIM><OP>∑</OP><LL><AR><R><C><UP>I</UP></C></R><R><C><UP>&mgr;<SUB>0</SUB>=k</UP></C></R></AR></LL></LIM> &agr;<SUB><UP>t</UP></SUB>(I)&bgr;<SUB><UP>t</UP></SUB>(I)&ohgr;<SUB><UP>I</UP></SUB>(t)&sfgr;<SUP><UP>−2</UP></SUP><SUB>&mgr;<SUB>0</SUB></SUB><FENCE>Y<SUB><UP>t−i</UP></SUB>−<LIM><OP>∑</OP><LL><UP>j</UP></LL></LIM> h<SUB><UP>j</UP></SUB>I<SUB><UP>&mgr;</UP><SUB><UP>i+j</UP></SUB></SUB></FENCE> (20)
where the coefficient cj(i), j = 0 ... p in the first equation represents the convolution of the filter impulse response {hj} with the AR coefficient {aj(i)}, as defined earlier. Note that there are a set of these equations for each conductance level, since each conductance has its own noise model. From the equations we see that the derivatives for the current amplitude are summed not only over all metastates but also over the individual components of each metastate, while the derivatives for the noise are summed only through the metastates. Such a discrepancy is as expected because the current amplitude information is contained in all samples in the memory, while the noise model, by definition, is only dependent on the first component of the metastate. Also note that setting the AR coefficient aj(i) to zero and the filter impulse response to a delta function reduces the above equations to be the same as those in the case of white noise.

The derivatives given above constitute only a part of the calculation for the final derivatives of the likelihood function. Except for the current amplitudes, they are not given with respect to the model parameters that are expected to be optimized. For example, both the starting probabilities and transition probabilities of the metastate Markov model are functions of the rate constants, while the AR coefficients are functions of the noise autocorrelations. The next step in the calculation is to evaluate these intermediate parameters and their derivatives with respect to the true model parameters, i.e., the rate constants and autocorrelations. The final derivatives of the likelihood function are then obtained by combining them using the differentiation chain rule.

Transition probability of metastates

We consider the calculation of the transition probability of the metastate Markov model and its derivatives with respect to the rate constants. To this end we need to first calculate the transition probability of the channel itself. This can be done using the same procedure that we have developed for the standard HMM (Qin et al., 2000). The calculation is based on the spectral expansion of the rate constant matrix Q
<B><UP>Q</UP></B>=<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>N</UP></UL></LIM> &lgr;<SUB><UP>i</UP></SUB><B><UP>A</UP></B><SUB><UP>i</UP></SUB> (21)
where lambda i is the ith eigenvalue of Q and Ai is the product of the corresponding left and right eigenvectors. Making use of the spectral expansion, the transition probability of the channel, which is a matrix exponential of the rate constant matrix as given in Eq. 1, can be represented by
<B><UP>A</UP></B>=<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>N</UP></UL></LIM> <B><UP>A</UP></B><SUB><UP>i</UP></SUB>e<SUP><UP>&lgr;<SUB>i</SUB>&Dgr;t</UP></SUP>. (22)
The derivatives of the transition probability with respect to the rate constants can be formulated as
<FR><NU>∂<B><UP>A</UP></B></NU><DE>∂q<SUB><UP>kl</UP></SUB></DE></FR>=<LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>N</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>j=1</UP></LL><UL><UP>N</UP></UL></LIM> <B><UP>A</UP></B><SUB><UP>i</UP></SUB> <FR><NU>∂<B><UP>Q</UP></B></NU><DE>∂q<SUB><UP>kl</UP></SUB></DE></FR> <B><UP>A</UP></B><SUB><UP>j</UP></SUB>f(&lgr;<SUB><UP>i</UP></SUB>, &lgr;<SUB><UP>j</UP></SUB>, &Dgr;t) (23)
where f(lambda i, lambda j, Delta t) is a scalar function defined by
f(&lgr;<SUB><UP>i</UP></SUB>, &lgr;<SUB><UP>j</UP></SUB>, &Dgr;t)=<FENCE><AR><R><C><FR><NU><UP>exp</UP>(&lgr;<SUB><UP>j</UP></SUB>&Dgr;t)−<UP>exp</UP>(&lgr;<SUB><UP>i</UP></SUB>&Dgr;t)</NU><DE>&lgr;<SUB><UP>j</UP></SUB>−&lgr;<SUB><UP>i</UP></SUB></DE></FR></C><C><UP> if</UP> &lgr;<SUB><UP>j</UP></SUB>≠&lgr;<SUB><UP>i</UP></SUB></C></R><R><C><UP>exp</UP>(&lgr;<SUB><UP>i</UP></SUB>&Dgr;t)&Dgr;t</C><C> <UP>otherwise</UP>.</C></R></AR></FENCE> (24)
The derivative partial Q/partial qkl in Eq. 23 is simply a constant matrix, in which all elements are equal to zero except the (k, l)th entry, which is plus one, and the kth diagonal element, which is minus one. Because most elements of the matrix are zero, the matrix product in the summation in Eq. 23 can be calculated efficiently.

Once we have the transition probability of the channel, the transition probability of the metastate Markov model along with its derivatives is readily obtainable. In particular, Eq. 9 specifies the condition for an allowable transition, and when the condition is satisfied, the corresponding transition probability aIJ is equal to the transition probability of the channel from state i0 to state j0, where i0 and j0 are the leading components of the two metastates, respectively. The derivative of the metastate transition probability aIJ with respect to any transition probability aij of the channel is either 0 or 1, depending on whether the transition is allowed and whether i and j are equal to i0 and j0, respectively.

Initial probability of metastates

By definition, a metastate defines which state the channel is in from the current sampling time through the previous histories in the memory, so the starting probability of a metastate I = [i0, i1 ... ip] is essentially the probability for the channel being in state ip initially and then in state ik at the subsequent sampling times for t = kDelta t, k = 1 ... p. As with the calculation of the transition probabilities of metastates, we need to first calculate the starting probability of the channel itself in order to determine starting probabilities of the metastates.

The starting probability of the channel is chosen as the equilibrium probability at the holding condition. It is given by
&pgr;<SUP>&tgr;</SUP><B><UP>Q</UP></B><SUB><UP>h</UP></SUB>=0 (25)
where Qh is the rate constant matrix at the holding condition. In general, the rate constants at the holding condition are related to the rate constants at the activating condition, and they can be reformulated in a way to share a common set of independent parameters. This point is discussed further under the topic of likelihood maximization. Thus, when Q is optimized, Qh will follow the change, and as a result, the starting probability of the channel is also optimized.

Equation 25 is homogeneous and needs to be solved by subjecting it to the probability totality constraint. The singular value decomposition technique can be applied for finding the solution (Press et al., 1992). It is possible that the model may become reducible at some holding conditions, in which case only the probabilities of the absorbing states are solved and others are fixed to zero. From Eq. 25 we can derive the derivatives of the initial probability as
<FR><NU>∂&pgr;<SUP>&tgr;</SUP></NU><DE>∂x</DE></FR> <B><UP>Q</UP></B><SUB><UP>h</UP></SUB>=&pgr; <FR><NU>∂<B><UP>Q</UP></B><SUB><UP>h</UP></SUB></NU><DE>∂x</DE></FR> (26)
where x could be any variable of interest. Notice that Eqs. 25 and 26 share a common coefficient matrix. Therefore, they can be solved together. One difference to note, however, is the constraint. For solving Eq. 26, namely, the sum of the unknowns is constrained to zero instead of one.

Given the starting probability of the channel and its subsequent transition probabilities, the starting probability of a metastate I = [i0, i1 ... ip] can be determined by
&pgr;<SUB><UP>I</UP></SUB>=&pgr;<SUB><UP>i</UP><SUB><UP>p</UP></SUB></SUB>a<SUB><UP>i<SUB>p</SUB>i</UP><SUB><UP>p−1</UP></SUB></SUB>a<SUB><UP>i<SUB>p−1</SUB>i</UP><SUB><UP>p−2</UP></SUB></SUB> … a<SUB><UP>i<SUB>1</SUB>i</UP><SUB><UP>0</UP></SUB></SUB> (27)
i.e., the probability of the channel entering state ip at t = 1, multiplied the probability of a transition from state ip to state ip-1 at t = 2, and then multiplied by the probabilities of the subsequent transitions until t = p + 1.

To derive the derivatives of the starting probabilities of metastates, we introduce two auxiliary quantities uk, 0 <=  k <=  p and vl, 0 <=  l <=  p where
u<SUB><UP>k</UP></SUB>=u<SUB><UP>k−1</UP></SUB>a<SUB><UP>i<SUB>k</SUB>i</UP><SUB><UP>k+1</UP></SUB></SUB> (28)

v<SUB><UP>l</UP></SUB>=a<SUB><UP>i<SUB>l</SUB>i</UP><SUB><UP>l+1</UP></SUB></SUB>v<SUB><UP>l−1</UP></SUB> (29)
with u0 = pi ip and vp = 1. That is, uk represents the forward partial product in Eq. 27 and vk the backward one. The derivative of pi I with respect to any variable x can then be written as
<FR><NU>∂&pgr;<SUB><UP>I</UP></SUB></NU><DE>∂x</DE></FR>=<FR><NU>∂&pgr;<SUB><UP>i</UP><SUB><UP>0</UP></SUB></SUB></NU><DE>∂x</DE></FR> u<SUB>1</SUB>+<LIM><OP>∑</OP><LL><UP>k=1</UP></LL><UL><UP>p</UP></UL></LIM> u<SUB><UP>k−1</UP></SUB> <FR><NU>∂a<SUB><UP>i<SUB>k−1</SUB>i</UP><SUB><UP>k</UP></SUB></SUB></NU><DE>∂x</DE></FR> v<SUB><UP>k</UP></SUB> (30)
where the derivatives of the channel starting probability and transition probability are given by Eqs. 26 and 23, respectively. Depending on the size of a metastate, i.e., the length of the filter plus the AR order, the calculation of uk and vk above may require appropriate scaling to prevent underflow. The scaling can be done following the same strategy used in the calculation of the likelihood function.

AR coefficients

The AR coefficients are determined from the autocorrelation variables according to the Yule-Waker equation (5). The equation can be solved efficiently using the Levinson-Durbin algorithm (Blahut, 1985). We now give a brief description of the algorithm and then show how to modify the algorithm to also calculate the derivatives of the AR coefficients. For clarity, we suppress the sub or superscripts for the conductance class. Let r0, r1 ... rp be the autocorrelations of the noise at a certain conductance. The algorithm calculates the coefficients of all AR models with orders up to p, denoted by {a11, sigma 12}, {a21, a22, sigma 22} ... {ap1, ap2, ... , app, sigma p2}. The final set at order p is the desired solution. Starting with sigma 02 = r0, the algorithm proceeds recursively for k = 1, 2 ... p as below:
a<SUB><UP>kk</UP></SUB>=<UP>−</UP><FR><NU>1</NU><DE>&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k</UP></SUB></DE></FR> <LIM><OP>∑</OP><LL><UP>j=0</UP></LL><UL><UP>k−1</UP></UL></LIM> a<SUB><UP>k−1,j</UP></SUB>r<SUB><UP>k−j</UP></SUB> (31)

a<SUB><UP>kj</UP></SUB>=a<SUB><UP>k−1,j</UP></SUB>+a<SUB><UP>kk</UP></SUB>a<SUB><UP>k−1,k−j</UP></SUB>, j=1 … k−1 (32)

&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k</UP></SUB>=(1−a<SUP><UP>2</UP></SUP><SUB><UP>kk</UP></SUB>)&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k−1</UP></SUB> (33)
that is, it first computes the leading coefficient akk, then uses it to compute the remaining akj values for j = 1 ... k - 1, and finally the variance sigma k2. The recursion terminates when the desired order is reached. The algorithm implicitly assumes that |akk<=  1 for all k in order to carry out the recursion for sigma k2. This is true for a strict AR process or when the autocorrelation matrix is positive definite. For an arbitrary process, the assumption may fail, in which case the algorithm should be terminated when sigma k2 becomes sufficiently small.

To calculate the derivatives of the AR coefficients, we differentiate the above recursive equations, leading to
<FR><NU>∂a<SUB><UP>kk</UP></SUB></NU><DE>∂x</DE></FR>=<UP>−</UP><FR><NU>1</NU><DE>&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k−1</UP></SUB></DE></FR><FENCE>a<SUB><UP>k,k</UP></SUB> <FR><NU>∂&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k−1</UP></SUB></NU><DE>∂x</DE></FR></FENCE> (34)

<FENCE>+<LIM><OP>∑</OP><LL><UP>j=0</UP></LL><UL><UP>k−1</UP></UL></LIM><FENCE><FR><NU>∂a<SUB><UP>k−1,j</UP></SUB></NU><DE>∂x</DE></FR> r<SUB><UP>k−j</UP></SUB>+a<SUB><UP>k−1,j</UP></SUB> <FR><NU>∂r<SUB><UP>k−j</UP></SUB></NU><DE>∂x</DE></FR></FENCE></FENCE> 

<FR><NU>∂a<SUB><UP>kj</UP></SUB></NU><DE>∂x</DE></FR>=<FR><NU>∂a<SUB><UP>k−1,j</UP></SUB></NU><DE>∂x</DE></FR>+<FR><NU>∂a<SUB><UP>kk</UP></SUB></NU><DE>∂x</DE></FR> a<SUB><UP>k−1,k−i</UP></SUB>+a<SUB><UP>k,k</UP></SUB> <FR><NU>∂a<SUB><UP>k−1,k−i</UP></SUB></NU><DE>∂x</DE></FR>, (35)

j=1 … k−1

<FR><NU>∂&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k</UP></SUB></NU><DE>∂x</DE></FR>=<UP>−</UP>2a<SUB><UP>k,k</UP></SUB> <FR><NU>∂a<SUB><UP>kk</UP></SUB></NU><DE>∂x</DE></FR> &sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k−1</UP></SUB>+(1−a<SUP><UP>2</UP></SUP><SUB><UP>kk</UP></SUB>) <FR><NU>∂&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>k−1</UP></SUB></NU><DE>∂x</DE></FR> (36)
where x represents an autocorrelation variable. The equations suggest that we can calculate the derivatives of the AR coefficients basically in the same way that the coefficients themselves are calculated. That is, the calculation is carried out recursively through the order, and the derivatives at the kth order are calculated from those at (k - 1)th order. At each stage, the same recursions need to be repeated for each variable x. Because the original Levinson-Durbin recursion has a complexity on the order of p2, the calculation of the derivatives for all variables will take on the order of p3 operations, which is about the same as the complexity of a general linear equation solver. In practice, the time needed for the computation of the AR coefficients and their derivatives is negligible compared to the evaluation of the likelihood, because the AR order is usually small.

Computational complexity

Although the calculation of the likelihood function and its derivatives involves many steps, the most time-consuming part is the calculation of the forward and backward variables and the calculation of the derivatives of the likelihood with respect to the transition probabilities. Each of these steps takes on the order of N2(p+1)T operations, where N is the number of states of the channel and p is equal to the order of the noise AR model plus the filter length. The time for the rest of the calculations is negligible. Therefore, the overall computational complexity for the calculation of the likelihood and its derivatives is on the order of N2(p+1)T. Such a complexity increases exponentially with the filter length or the AR order, which is the major limiting factor for the practical applicability of the algorithm.

There are several maneuvers that can be used to cut down the computation. One is to take advantage of the aggregation property of the channel in the definition of the metastate. Specifically, we can use the state only for the first component of a metastate while defining others to be conductance class. That is
<B><UP>s</UP></B><SUB><UP>t</UP></SUB>=(s<SUB><UP>t</UP></SUB>, x<SUB><UP>t−1</UP></SUB> … x<SUB><UP>t−p</UP></SUB>) (37)
where xt represents the conductance class of the state sequence st. Such a new vector process remains Markovian because its transition is fully determined by the first component. The state space of the new process, however, is much smaller. If the channel has N states and M conductances, the number of metastates using this definition will be NMp as opposed to Np+1 with the old definition. Because channels often have only two conductances, such a reduction is significant, especially considering that the computation is quadratic on the number of metastates. In the implementation of our algorithm we have exploited this definition. The reason that we have used the other definition in the above is mainly for clarity of the presentation.

Another operation that may speed up the algorithm is to make use of the fact that many transitions between metastates are disallowed. As a consequence, the summation in the forward and backward recursions does not need to be extended through all possible metastates. Instead, we can restrict it only to the allowable transitions. As stated above, a transition between two metastates is allowed if and only if they satisfy Eq. 9. Therefore, there are only N allowable transitions for each metastate. The summation in the forward and backward recursions needs to be performed for only N times instead of NMp. This is true for every metastate at each time t, thus leading to a total reduction in the computation by a factor of Mp, which is considerable as p becomes large.



    MAXIMIZATION OF THE LIKELIHOOD
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

We have shown how to evaluate the likelihood and its derivatives for a given particular set of model parameters. The next step is to optimize the model to search for the maximum of the likelihood function. This is essentially an optimization problem and can be solved in a variety of ways. We have used the same approach that we have developed for the maximization of the likelihood with the standard HMM (Qin et al., 2000). The following is a brief outline of some major features of the approach.

The search of the likelihood space is performed using a quasi-Newton method (Fletcher, 1980, 1981). It is based on successive approximation of the likelihood surface with parabolas and uses an approximate inverse Hessian matrix built from the first-order partial derivatives. The main advantage of the method is the quadratic convergence near the maximum point because the likelihood surface there can be well-approximated by a parabola. It also avoids the need for exact line minimization. An approximate line minimization usually takes one to two function evaluations, but an accurate line minimization may require repetitive braking, and therefore many function calls. When the evaluation of the objective function is computationally costly, this may significantly degrade the overall efficiency of the algorithm. Another feature of the method is that it results in an estimate of the inverse Hessian matrix, which can be used to derive the standard errors of the parameter estimates.

Constraints on the model parameters are allowed. The constraints on rate constants include holding rates at fixed values, linear scaling between two rates, detailed balance, and so on. The current amplitudes can be linearly scaled or fixed. The noise can be constrained too, for example, to restrict the excess open noise to be white or the noise at different conductance to have the same amount of correlation. These constraints are either linear or can be converted to be linear. Therefore, they can be handled analytically by formulating the constrained variables into linear combinations of a set of unconstrained ones. As a result, the originally constrained problem is reduced to an equivalent unconstrained one, making the optimization more efficient.

The approach also allows multiple data sets obtained at different experimental conditions to be fit simultaneously. The feature not only allows more data to be used for modeling, but also becomes required for resolving models that degenerate at individual conditions. In our implementation, the rates in the model are represented explicitly as a function of the experimental condition such as ligand concentration, voltage, or force. The generic rate constants that are independent of experimental conditions are chosen as the variables to optimize. Such a parametrization of the rates also makes it possible to specify the initial probability of the channel by holding experimental conditions, which is particularly useful for channels not under equilibrium.



    EXTENSION TO MULTIPLE CHANNELS
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

The expression of ion channels in membranes is often high, making the recording of currents from an individual channel difficult. In many cases one can only obtain recordings arising from multiple channels. The algorithm described above can be extended readily for the analysis of such data.

The kinetics of multi-channel activity remains Markovian provided that the individual channels act independently. Consider a patch containing K channels. Let st(i) denote the state sequence of the ith channel at time t. The configuration of the system at any time can be characterized by a K-tuple
&ugr;<SUB><UP>t</UP></SUB>≡(s<SUP>(<UP>1</UP>)</SUP><SUB><UP>t</UP></SUB>, s<SUP>(<UP>2</UP>)</SUP><SUB><UP>t</UP></SUB>,…, s<SUP>(<UP>K</UP>)</SUP><SUB><UP>t</UP></SUB>) (38)
which can be considered as a new Markov process. Because each component of the tuple has a finite number of states, the tuple itself also has a finite number of states. Specifically, if st(i) has Ni states, the tuple has N1N2 ... NK states. The first-order transitions between two states occur if and only if there is one channel in the patch making a transition and the corresponding rate constant is the same as that of the underlying channel.

For identical channels, Horn and Lang (1983) suggest a more efficient definition, where the tuple has the form
<B><UP>n</UP></B><SUB><UP>t</UP></SUB>=(n<SUP>(<UP>1</UP>)</SUP><SUB><UP>t</UP></SUB>, n<SUP>(<UP>2</UP>)</SUP><SUB><UP>t</UP></SUB>,…, n<SUP>(<UP>K</UP>)</SUP><SUB><UP>t</UP></SUB>) (39)
where nt(i) is the number of channels in state i at time t. Such a definition takes the advantage of the indistinguishability of the composition channels and therefore results in a smaller number of tuple states. For example, if the channel has N states, nt has only <FENCE><AR><R><C><SUB>N+K−1</SUB></C></R><R><C><SUB>N</SUB></C></R></AR>:</FENCE> instead of NK states. The condition for an allowable transition between two tuple states stays the same, i.e., only one channel in the patch undergoes a transition at the same time. The transition rate, however, is different and needs to be multiplied by the number of the channels at the starting state of the transition. For a detailed description of the two approaches, see Qin et al. (1996).

The observations for the tuple process can also be parametrized in terms of those for the single channels. For illustration, we consider a patch containing identical channels and each channel has three conductance classes, namely closed (C), partially open (S), and fully open (O). Suppose there is a tuple with nC channels closed, nS channels partially open, and nO channels fully open. The current amplitude of the tuple state is given by
nI<SUB><UP>C</UP></SUB>+n<SUB><UP>S</UP></SUB>(I<SUB><UP>S</UP></SUB>−I<SUB><UP>C</UP></SUB>)+n<SUB><UP>O</UP></SUB>(I<SUB><UP>O</UP></SUB>−I<SUB><UP>C</UP></SUB>) (40)
where IC, IS, and IO are the current amplitudes of the single channel. The first term is the current at the baseline, the second is contributed by the channels at the substates, and the third due to the channels at the fully open states. Similarly, the noise autocorrelations corresponding to the tuple state are given by
r<SUP>(<UP>C</UP>)</SUP><SUB><UP>j</UP></SUB>+n<SUB><UP>S</UP></SUB>(r<SUP>(<UP>S</UP>)</SUP><SUB><UP>j</UP></SUB>−r<SUP>(<UP>C</UP>)</SUP><SUB><UP>j</UP></SUB>)+n<SUB><UP>O</UP></SUB>(r<SUP>(<UP>O</UP>)</SUP><SUB><UP>j</UP></SUB>−r<SUP>(<UP>C</UP>)</SUP><SUB><UP>j</UP></SUB>) (41)
where {rj(C)}, {rj(S)} and {rj(O)} are the autocorrelations of the noise associated with single channels at the three conductance levels.

From this example it is seen that both the current amplitudes and noise autocorrelations for the tuple states are linear combinations of the singles. This is generally true and applies to both identical and nonidentical channels. It should be mentioned that such linear relationships are retained because of the use of autocorrelations to parametrize the noise. If the AR coefficients were used, the relation would be more complicated and generally cannot be specified analytically because the AR functions are not closed under the additive operations.

It is worth noting that the definition of a conductance class is sometimes ambiguous when multiple channels are present. For a single channel, the states are classified in terms of their observable current amplitudes, and the states with the same current amplitude are assumed to have the same noise properties. But for multiple channels, two tuple states may have the same current amplitude but different noise characteristics. Therefore, the current amplitude alone is not enough for classification. There are several solutions to the problem. One is to treat each state as a single class. The disadvantage of doing so is that it may result in an excessive number of metastates, because the redundancy between the states is not taken into account. A more efficient approach, which we have used in our implementation, is to use the number of channels at each conductance as a criterion. For the above example, the criterion would be the tuple (nC, nS, nO) where nC, nS, and nO are the number of channels in the patch at the three conductances, respectively. All tuple states with the same index are guaranteed to have the same current amplitude and noise property.

With the tuple process defined in Eq. 38 or 39 as the underlying Markov process, the multiple channel data can be fitted in the same way as the singles. The only difference is the constraints, since the parameters for the tuple process are not all independent. Instead, they are linear combinations of the singles, as shown above. To minimize the degrees of freedom, we have chosen to optimize the single channel variables directly. Fortunately, all constraints, including those for the noise, are linear, and therefore can be handled easily using the same approach described in the previous section.



    EXPERIMENTAL RESULTS
TOP
ABSTRACT
INTRODUCTION
THE MODEL
THE LIKELIHOOD FUNCTION AND...
MAXIMIZATION OF THE LIKELIHOOD
EXTENSION TO MULTIPLE CHANNELS
EXPERIMENTAL RESULTS
DISCUSSION
REFERENCES

We present a few examples to show some basic features of the HMM approach described in the previous sections. A set of four examples is given. The first example illustrates the general applicability of the algorithm to the typical correlated patch-clamp background noise. In the second example, we introduce excess open noise, both white and correlated, in addition to the correlated background noise. The third example tests the algorithm with filtered data. The last example demonstrates the applicability of the algorithm to data containing multiple channels.

The algorithm is implemented in a general way to allow for both single channels and multiple channels, with either white or correlated noise. The input to the program is a single channel model along with the number of channels. Internally, the program first builds a multi-channel Markov model for the tuple process, as defined above. In the case of a single channel, this multi-channel model reduces to the single one, but in the case of multiple channels, a set of linear constraints are set up automatically on the rate constants, current amplitudes, and noise autocorrelations. Following the multi-channel Markov model, the algorithm then builds a metastate Markov model based on the input filtering and noise correlation. The noise at different conductance levels can be constrained to be either the same or to differ by an excess variance that can be either white or correlated.

Colored background noise

We first consider an example with simulated patch-clamp noise. The power spectrum of the background noise in a typical patch-clamp experiment can be described by (Sigworth, 1995)
S(f)=a+bf<SUP>2</SUP> (42)
over the experimentally accessible frequency range from dc to 100 kHz. The frequency-independent component is mainly due to the shot noise of resistors in the amplifier. The f2-dependent component arises from current fluctuations resulting from the amplifier voltage noise imposed on the pipette and amplifier stray capacitance. In discrete time, Venkataramanan et al. (1998a) show that the noise can be approximated by a first-order moving average (MA) process
n(t)=m<SUB>0</SUB>w(t)+m<SUB>1</SUB>w(t−1) (43)
where w(t) is white, Gaussian noise with zero mean and variance sigma w2. Typical values for coefficients are m0 = 0.8 and m2 = -0.6, obtained by fitting a representative patch-clamp recording without channel activity.

Although the noise is a simple first-order MA process, it requires a more complicated AR model for a good approximation. A first-order MA process has a correlation extending over only one sample point, but a first-order AR process usually correlates over an infinite number of samples. Fig. 1 shows the power spectrum of the first-order MA model given in Eq. 43 in comparison with its AR approximations with various orders. From the figure we see that a first-order AR doesn't have a sufficient capacity to model the noise. But as the order increases, the approximation becomes more accurate. In theory, the approximation can be made into an arbitrary accuracy with a sufficiently large order. In practice, however, the exponentially increasing computation burden limits the use of very high-order models.




View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 1   Power spectrum density of a first-order MA process in comparison with its AR approximations at different orders. The AR model is determined by solving the Yule-Walker equation. The approximation improves as the order increases.

The data were simulated based on a two state model: