help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rosales, R.
Right arrow Articles by Hladky, S. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rosales, R.
Right arrow Articles by Hladky, S. B.

Biophys J, March 2001, p. 1088-1103, Vol. 80, No. 3

Bayesian Restoration of Ion Channel Records using Hidden Markov Models

Rafael Rosales,*dagger J. Alex Stark,dagger William J. Fitzgerald,dagger and Stephen B. Hladky*

 *Pharmacology and  dagger Engineering, University of Cambridge, Cambridge CB2 1QJ, United Kingdom


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
ION CHANNEL MODELING
BAYESIAN APPROACH
RESULTS
DISCUSSION
APPENDIX
APPENDIX B
REFERENCES

Hidden Markov models have been used to restore recorded signals of single ion channels buried in background noise. Parameter estimation and signal restoration are usually carried out through likelihood maximization by using variants of the Baum-Welch forward-backward procedures. This paper presents an alternative approach for dealing with this inferential task. The inferences are made by using a combination of the framework provided by Bayesian statistics and numerical methods based on Markov chain Monte Carlo stochastic simulation. The reliability of this approach is tested by using synthetic signals of known characteristics. The expectations of the model parameters estimated here are close to those calculated using the Baum-Welch algorithm, but the present methods also yield estimates of their errors. Comparisons of the results of the Bayesian Markov Chain Monte Carlo approach with those obtained by filtering and thresholding demonstrate clearly the superiority of the new methods.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
ION CHANNEL MODELING
BAYESIAN APPROACH
RESULTS
DISCUSSION
APPENDIX
APPENDIX B
REFERENCES

The statistical analysis of single channel patch clamp records has been the subject of extensive research for over two decades. The most common scheme uses filtering and thresholding to recover the underlying process that is followed by the channel (see Colquhoun and Sigworth, 1995, for an extensive review). As a consequence, the original signal is reduced to a sequence of dwell times at a finite number of possible conductance levels separated by the thresholds. Empirical density estimates for the time spent at each level are then constructed as histograms, which are usually fitted by exponential mixtures. In the second stage of the analysis, the obtained mixtures characterized by a set of weights and rate constants are interpreted by modeling the channel dynamics as a finite state space aggregated Markov chain which is also homogeneous and has continuous (time) parameter (Colquhoun and Hawkes, 1982; Fredkin et al., 1985; Ball and Sansom, 1989). The inferences are directed toward the structure and specific value of the infinitesimal generator (the transition rate matrix) of the Markov chain. Finally, model selection between various alternatives is handled by using traditional penalized likelihood ratios. Ball and Rice (1992) provide a critical overview of the inferential difficulties of these approaches.

With only a few exceptions (Fredkin and Rice, 1992b), these methods rely heavily on the accuracy of the initial restoration step and are thus complicated when considering signal-to-noise ratios that are at the limit of filtering and thresholding. The pioneering work of Chung et al. (1990) introduced hidden Markov models with the initial aim of extracting information about current amplitudes and channel kinetics at low signal-to-noise ratios. More recently, hidden Markov models have also been considered by Venkataramanan et al. (1998) and Michalek and Timmer (1999) extending the applicability of the initial framework. Parameter estimation and signal restoration are usually carried out in an effective way by using variants of Baum's forward-backward procedures and reestimation formulas (Baum et al., 1970). In this case, estimates are represented by points in the parameter space that correspond to the coordinates of a local maximum of the likelihood.

A radically different approach for the treatment of hidden Markov models is provided by Bayesian statistics. In this paper, the initial restoration problem is addressed by modeling the observations with hidden Markov models. However, inferences are performed by using Bayesian statistics and an extension to a stochastic simulation method first proposed by Robert et al. (1993). Fredkin and Rice (1992a) also considered Bayesian restoration, but their approach is methodologically closer to Baum's maximization procedures. Bayesian inference based on stochastic search methods known as Markov Chain Monte Carlo have also been considered by Ball et al. (1999, 1997), Hodgson (1999), and Hodgson and Green (1999), however, the parameterization and their inferences are directed toward the transition rate matrix associated with a gating mechanism. The methods developed here, based on simpler procedures, are computationally efficient and supported by a solid probabilistic basis.

In this article, it is shown that the Bayesian methodology provides a useful way to deal with hidden Markov models for ion channels. Bayesian theory provides statistical grounds for the assessment of the uncertainty in all the unknowns considered in the modeling process. The next two sections present a brief overview of the problems posed by hidden Markov models and also their treatment from a Bayesian perspective. Results for various synthetic signals are shown in the following section, together with those obtained via standard Baum-Welch maximization. Comparisons of the Bayesian Markov Chain Monte Carlo approach against filtering and thresholding clearly demonstrate the new methods to be superior.


    ION CHANNEL MODELING
TOP
ABSTRACT
INTRODUCTION
ION CHANNEL MODELING
BAYESIAN APPROACH
RESULTS
DISCUSSION
APPENDIX
APPENDIX B
REFERENCES

Hidden Markov model

This section presents an overview of the hidden Markov model formalism that will be adopted and the notation to be used. The parameterization is consistent with the one considered by Chung et al. (1990) or Venkataramanan et al. (1998).

A hidden Markov model is defined by two interrelated stochastic processes. The first of these, representing the ion channel, is a first-order finite-state homogeneous Markov chain, {zl: l is in  R+}, on C = {1, 2, ... , n}, with initial density lambda  and transition rate matrix Q. The continuous process is approximated by a discrete time version, {zk: k is in  T}, where T = {delta , 2delta , ... , delta N}, with N as the total number of sample points and delta  the sampling period of the acquisition system (hereafter we assume delta  = 1). In these terms {zk} is determined by a transition probability matrix, A = exp(Qdelta ), with elements aij = P(zk = j|zk-1 = i), i, j is in  C (where P(X|Y) is the probability of X given Y). The second process is represented by the set y = {yk: k is in  T} of independent and identically distributed random variables that constitute the observations. Each observation yk is assumed to arise as a function of zk, defined by a conditional density
ℙ(y<SUP><UP>k</UP></SUP>‖z<SUP><UP>k</UP></SUP>=i, &thgr;)=d<SUB><UP>i</UP></SUB>(y<SUP><UP>k</UP></SUP>), (1)
where theta  is possibly a vector used to denote the parameters associated with a particular family d. Under standard normality assumptions, the function of the process is
d<SUB><UP>i</UP></SUB>(y<SUP><UP>k</UP></SUP>) ∝ <UP>exp</UP>[<UP>−</UP>(y<SUP><UP>k</UP></SUP>−q<SUB><UP>i</UP></SUB>)<SUP>2</SUP>/2&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>], (2)
with qi and sigma <UP><SUB>i</SUB><SUP>2</SUP></UP> as the mean and the variance of the ith possible outcome associated with zk. Assuming the number of conductance states, n, is known a priori, the model is completely specified by
&thgr;=(&lgr;<SUB><UP>i</UP></SUB>, a<SUB><UP>ij</UP></SUB>, q<SUB><UP>i</UP></SUB>, &sfgr;<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>), i, j∈𝒞. (3)
This suggests an equivalence between states and conductances. However, this correspondence may be relaxed by imposing constraints on q such as qj = xi  for some value xi , and j is in  J subset  C if zk = j, as will be shown in the section Class Dwell Times.

In this setting, each observation arises as the sum of two terms,
y<SUP><UP>k</UP></SUP>=q<SUB><UP>i</UP></SUB>+&sfgr;<SUB><UP>i</UP></SUB>&eegr;<SUP><UP>k</UP></SUP>, &eegr;<SUP><UP>k</UP></SUP> <LIM><OP><UP>∼</UP></OP><UL><UP>iid</UP></UL></LIM> N(0, 1). (4)
The first term at the right-hand side of Eq. 4 represents the amount of current flowing through the channel associated with the ith conductance level, whereas eta k is a normal distributed random variable with mean 0 and variance 1, representing the disturbance of the noise inherent in the recording apparatus. The symbol ~ together with iid is used to denote independent and identically distributed.

The assumption of independent observations, explicitly stated as
ℙ(y<SUP><UP>k</UP></SUP>‖y<SUP><UP>k−1</UP></SUP>,…, y<SUP>1</SUP>, z<SUP><UP>k</UP></SUP>=i, &thgr;)=ℙ(y<SUP><UP>k</UP></SUP>‖z<SUP><UP>k</UP></SUP>=i, &thgr;), (5)
is an important simplification that leads to the analytical treatment of the model presented so far. It should be noted that the observations yk are indirectly dependent on each other through the Markovian structure induced conditionally on di(yk) by the underlying process zk.

Maximum likelihood approach

Let Z = {z: zk = i; k is in  T, i is in  C} be the space of events constituted by the path realizations of the chain {zk}, and z one of its elements, i.e., z constitutes a particular sequence of conductance states and Z the set of all possible sequences. Then, following Eq. 1 and the Markov property on z, the likelihood, which equals the probability of the observations conditional on the parameters, L(theta ) = p(y|theta ), is given by
L(&thgr;)=<LIM><OP>∑</OP><LL><UP>z∈Z</UP></LL></LIM> ℙ(y, z‖&thgr;)=<LIM><OP>∑</OP><LL><UP>z∈Z</UP></LL></LIM> ℙ(y‖&thgr;, z)ℙ(z‖&thgr;) (6)

=<LIM><OP>∑</OP><LL><UP>i<SUB>1</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>i<SUB>2</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM> … <LIM><OP>∑</OP><LL><UP>i<SUB>N</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM>[&lgr;<SUB><UP>i</UP><SUB><UP>1</UP></SUB></SUB>d<SUB><UP>i</UP><SUB><UP>1</UP></SUB></SUB>(y<SUP>1</SUP>)a<SUB><UP>i</UP><SUB><UP>1</UP></SUB>i<SUB>2</SUB></SUB>d<SUB><UP>i</UP><SUB><UP>2</UP></SUB></SUB>(y<SUP>2</SUP>) … a<SUB><UP>i<SUB>N−1</SUB>i</UP><SUB><UP>N</UP></SUB></SUB>d<SUB><UP>i</UP><SUB><UP>N</UP></SUB></SUB>(y<SUP><UP>N</UP></SUP>)],
with i1, i2, ... , iN is in  C. To simplify notation, let
h<SUP><UP>1</UP></SUP><SUB><UP>i</UP></SUB>=&lgr;<SUB><UP>i</UP></SUB>d<SUB><UP>i</UP></SUB>(y<SUP>1</SUP>), h<SUP><UP>k</UP></SUP><SUB><UP>ji</UP></SUB>=a<SUB><UP>ji</UP></SUB>d<SUB><UP>i</UP></SUB>(y<SUP><UP>k</UP></SUP>), (7)
then
L(&thgr;)=<LIM><OP>∑</OP><LL><UP>i<SUB>1</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM> … <LIM><OP>∑</OP><LL><UP>i<SUB>N−1</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM> <LIM><OP>∑</OP><LL><UP>i<SUB>N</SUB>=1</UP></LL><UL><UP>n</UP></UL></LIM> h<SUP><UP>1</UP></SUP><SUB><UP>i<SUB>1</SUB></UP></SUB> … h<SUP><UP>N</UP></SUP><SUB><UP>i<SUB>N−1</SUB>i<SUB>N</SUB></UP></SUB>. (8)
Statistical inferences concerning theta  generally take the form of point estimates, theta *, obtained at a local maximum of L(theta ),
&thgr;*=<LIM><OP><UP>argmax</UP></OP><LL>&thgr;<UP>∈</UP>&THgr;</LL></LIM>{L(&thgr;)}, (9)
with Theta  usually a compact subset of some Euclidean space Rd, more precisely Theta  = Rn × R<UP><SUB>+</SUB><SUP>n</SUP></UP> × [0, 1]n×n+1. The notation "argmax{L( )}" refers to the point in Theta  where L( ) attains a (local) maximum. In the particular case when d is normal (Eq. 2) and z is known, the required maximization is analytic. However, in patch clamp recordings, z is unknown and inferences are directed toward both (ztheta ). In this case, the maximization has to be performed numerically by considering nN terms at Eq. 8. Evaluation of L(theta ) becomes a critical issue, especially in the case of patch clamp records suitable for stationary analysis where N > 106. In the general framework, this problem was first considered by Baum et al. (1970), who derived an efficient solution reducing the evaluation to O(n2N). Baum's methods, known as the forward-backward algorithm and reestimation formulas, were first applied to patch clamp records by Chung et al. (1990).


    BAYESIAN APPROACH
TOP
ABSTRACT
INTRODUCTION
ION CHANNEL MODELING
BAYESIAN APPROACH
RESULTS
DISCUSSION
APPENDIX
APPENDIX B
REFERENCES

Bayesian analysis proceeds by considering the parameters theta  as random variables on Theta  subset  Rd. Inferences in this case require the definition of a joint density p(ytheta ), which can be decomposed as L(theta )p(theta ), simply by following the definition of conditional probability. The density p(theta ), known as the prior, represents our previous knowledge or belief about the value of the parameters before the observations have been examined. Once y is available, the uncertainties on theta  are represented by a posterior density, pi (theta |y), which follows from p(ytheta ) and the application of Bayes Theorem as
&pgr;(&thgr;‖y)=L(&thgr;)p(&thgr;)<FENCE><LIM><OP>∫</OP></LIM>L(&thgr;)p(&thgr;) <UP>d</UP>&thgr;.</FENCE> (10)
Let Epi denote mathematical expectation with respect to the posterior and g any (square integrable) function of theta . Formal statistical inferences in this context often take the form of integrals
𝔼<SUB>&pgr;</SUB>[g(&thgr;)]=<LIM><OP>∫</OP></LIM>g(&thgr;)&pgr;(&thgr;‖y) <UP>d</UP>&thgr;, (11)
and are thus based on the support of pi (theta |y) rather than on a single point estimate. This ensures the use of probabilities for events theta is in  C (C subset or equal  Theta ), in contrast to classical confidence procedures. An account of Bayesian theory can be found in Bernardo and Smith (1994).

Although there are clear theoretical advantages of the Bayesian approach, there are difficulties that must be overcome before they can be realized in practice. The integrals in Eqs. 10 and 11 are not analytical for the current model. This fact arises as a consequence of the structure imposed by the likelihood in Eq. 8, regardless of the specific form of g(theta ) and p(theta ).

Markov chain Monte Carlo for hidden Markov models

A solution to the integration task posed by Eqs. 10 and 11 is provided by Monte Carlo approximation together with Markov chain sampling. First, an ergodic Markov chain theta (0), theta (1), ... , is constructed on Theta , such that its invariant distribution corresponds to the hidden Markov model posterior pi (theta |y). It should be made clear that this process represents a different Markov chain than that followed by the channel on Z.

More precisely, let Ktheta m = P(theta (m) is in  B|theta (m-1) = x) denote the step transition probability from theta (m-1) to theta (m) for any m = 1, 2, ... , and pi 0 = P(theta (0) is in  A) an initial density for A, B subset  Theta , and x is in  Theta . Under mild regularity conditions that ensure ergodicity
<LIM><OP><UP>lim</UP></OP><LL><UP>m→∞</UP></LL></LIM>[K<SUP><UP>m</UP></SUP><SUB><UP>&thgr;</UP></SUB>&pgr;<SUP>0</SUP>] <LIM><OP><ARROW>→</ARROW></OP><LL><SUP><UP>d</UP></SUP></LL></LIM> &pgr;(&thgr;‖y), (12)
for almost all initial values of theta (0). Here K<UP><SUB>&thgr;</SUB><SUP>m</SUP></UP> = P(theta (m) is in  B|theta (0) = x) denotes the mth iterate of K<UP><SUB><IT>&thgr;</IT></SUB><SUP><IT>1</IT></SUP></UP>, that is, K<UP><SUB>&thgr;</SUB><SUP>m</SUP></UP> = Ktheta mK<UP><SUB>&thgr;</SUB><SUP>m−1</SUP></UP>, and right-arrowd is used to denote convergence in distribution (see Billingsley, 1968). Thus, after an initial relaxation period b, successive iterations generate a sequence of samples theta (b+1), ... , theta (M) that are approximately distributed according to pi (theta |y). This sequence may then be used to calculate averages that approximate the desired expectations under the posterior target density
𝔼<SUB>&pgr;</SUB>[g(&thgr;)]≈<FR><NU>1</NU><DE>M−b</DE></FR> <LIM><OP>∑</OP><LL><UP>m=b+1</UP></LL><UL><UP>M</UP></UL></LIM> g(&thgr;<SUP>(<UP>m</UP>)</SUP>), (13)
constituting a Monte Carlo estimate of the integrals in Eq. 11.

The combination of these techniques has been used to tackle complex multidimensional problems, proving to be an effective tool where standard frequentist methods have failed. The theoretical background needed for the use of these general state chains is described in Nummelin (1984) or Meyn and Tweedie (1993); whereas their application to Bayesian statistics is presented among others by Tierney (1994), Besag et al. (1995), or Robert and Casella (1999). J. A. Stark, R. Rosales, W. J. Fitzgerald, and S. B. Hladky (manuscript in preparation) also present a different Markov chain Monte Carlo approach for the single channel restoration problem in terms of a change point model. The particular chain that will be used to generate approximate samples from the posterior in Eq. 10 is presented in detail in the next section.

Gibbs sampler

Let p(theta i|theta -i, y) be the (full) conditional densities for each of the components theta i, given values of the other components theta -i = {theta j: j not equal  i}. These densities are uniquely determined by a particular posterior (Besag, 1974) and constitute the building blocks of a Markov chain Monte Carlo method known as the Gibbs sampler (Gelfand and Smith, 1990). This sampler proceeds as follows. Given an arbitrary set of starting values theta (0) = (theta <UP><SUB>1</SUB><SUP>(0)</SUP></UP>, ... , theta <UP><SUB>r</SUB><SUP>(0)</SUP></UP>), draw a sample of theta <UP><SUB>1</SUB><SUP>(1)</SUP></UP> from p(theta 1|theta <UP><SUB><IT>−1</IT></SUB><SUP><IT>(0)</IT></SUP></UP>, y), i.e.,
&thgr;<SUP>(1)</SUP><SUB>1</SUB>∼p(&thgr;<SUB>1</SUB>‖&thgr;<SUP>(0)</SUP><SUB>2</SUB>,…, &thgr;<SUP>(<UP>0</UP>)</SUP><SUB><UP>r</UP></SUB>, y)
then
<AR><R><C><UP>&thgr;</UP><SUP>(<UP>1</UP>)</SUP><SUB><UP>2</UP></SUB></C><C><UP>∼</UP>p(&thgr;<SUB>2</SUB>‖&thgr;<SUP>(1)</SUP><SUB>1</SUB>, &thgr;<SUP>(0)</SUP><SUB>3</SUB>,…, &thgr;<SUP>(<UP>0</UP>)</SUP><SUB><UP>r</UP></SUB>, y)</C></R><R><C></C><C><UP>…</UP></C></R><R><C><UP>&thgr;</UP><SUP>(<UP>1</UP>)</SUP><SUB><UP>r</UP></SUB></C><C><UP>∼</UP>p(&thgr;<SUB><UP>r</UP></SUB>‖&thgr;<SUP>(1)</SUP><SUB>1</SUB>,…, &thgr;<SUP>(<UP>1</UP>)</SUP><SUB><UP>r−1</UP></SUB>, y).</C></R></AR> (14)
This completes one iteration of the sampling scheme theta  ~ {p(theta i|theta -i, y)}, and also a transition from theta (0) to theta (1). The sampling of a high dimensional vector theta  has been replaced by the sampling of lower dimensional components, which is one of the key features of this algorithm.

The decomposition of pi (theta |y) into its conditionals is rather involved because of the nN terms in Eq. 8. Consider instead the joint pi (theta z|y), which leads to the following decomposition and sampling scheme:
(&thgr;, z)∼{p(&thgr;<SUB><UP>i</UP></SUB>‖&thgr;<SUB><UP>−i</UP></SUB>, y, z), p(z‖&thgr;, y)}. (15)
In this case, under the appropriate choice of priors, all the densities p(theta i|theta -i, y, z) are analytical because they only involve one of the possible realizations in Z. This simplification was first considered by Robert et al. (1993), and by Diebolt and Robert (1994) in the general mixture distribution context. The key of this conditional decomposition resides in considering z as a random variable, much in the same way as theta , instead of integrating it out. This scheme leads to rigorous theoretical facts on the convergence of the sampler to be discussed in the section entitled Convergence.

Following Eq. 15, draws have to be produced from the full conditional for the hidden process
p(z‖y, &thgr;)=h<SUP><UP>1</UP></SUP><SUB><UP>i<SUB>1</SUB></UP></SUB> … h<SUP><UP>N</UP></SUP><SUB><UP>i<SUB>N−1</SUB>i<SUB>N</SUB></UP></SUB>/L(&thgr;) (16)
for any i1, ..., iN-1, iN is in  C. This can be achieved in different ways. A first alternative consists of the method proposed by Carter and Kohn (1994), which constitutes a stochastic version of the forward-backward algorithm. This follows by noting that p(z|ytheta ) can be decomposed as
p(z‖y, &thgr;)=p(z<SUP><UP>N</UP></SUP>‖y, &thgr;) <LIM><OP>∏</OP><LL><UP>k=1</UP></LL><UL><UP>N−1</UP></UL></LIM> p(z<SUP><UP>k</UP></SUP>‖z<SUP><UP>k+1</UP></SUP>, y<SUB><UP>k</UP></SUB>, &thgr;), (17)
with yk = y1, ... , yk. Given zk+1, p(zk|zk+1, yk, theta ) is a discrete distribution, which suggests the following sampling strategy. For k = 2, ... , N and i is in  C, compute and store the optimal filter p(zk = i|yktheta ), then sample zN from p(zN|ytheta ), and, for k = N - 1, ... , 1, sample zk from p(zk|zk+1, yk, theta ), where, if zk+1 = l, for i is in  C,
p(z<SUP><UP>k</UP></SUP>=i‖z<SUP><UP>k+1</UP></SUP>=l, y<SUB><UP>k</UP></SUB>, &thgr;)=<FR><NU>a<SUB><UP>il</UP></SUB>p(z<SUP><UP>k</UP></SUP>=i‖y<SUB><UP>k</UP></SUB>, &thgr;)</NU><DE><LIM><OP>∑</OP><LL><UP>j∈𝒞</UP></LL></LIM> a<SUB><UP>jl</UP></SUB>p(z<SUP><UP>k</UP></SUP>=j‖y<SUB><UP>k</UP></SUB>, &thgr;)</DE></FR>. (18)
This strategy updates (theta z) by following the decomposition given by Eq. 15. A second option follows by considering the full conditional densities
p(z<SUP><UP>k</UP></SUP>‖z<SUP><UP>−k</UP></SUP>, y, &thgr;)=p(z<SUP><UP>k</UP></SUP>‖z<SUP><UP>k−1</UP></SUP>, z<SUP><UP>k+1</UP></SUP>, y, &thgr;), (19)
which follow from the Markov property on z. For k = 2, ... , N - 1 and for any i, j, r is in  C, these densities are
p(z<SUP>1</SUP>=i‖y, &thgr;) ∝ h<SUP><UP>1</UP></SUP><SUB><UP>i</UP></SUB>,

p(z<SUP><UP>k</UP></SUP>=i‖z<SUP><UP>k−1</UP></SUP>=j, z<SUP><UP>k+1</UP></SUP>=r, y, &thgr;) ∝ h<SUP><UP>k</UP></SUP><SUB><UP>ji</UP></SUB>a<SUB><UP>ir</UP></SUB>,
and, for k = N,
p(z<SUP><UP>N</UP></SUP>=i‖z<SUP><UP>N−1</UP></SUP>=j, &thgr;, y) ∝ h<SUP><UP>N</UP></SUP><SUB><UP>ji</UP></SUB>. (20)
This sampling scheme leads to the conditional decomposition,
(&thgr;, z)∼{p(&thgr;<SUB><UP>i</UP></SUB>‖&thgr;<SUB><UP>−i</UP></SUB>, y, z), p(z<SUP><UP>k</UP></SUP>‖y, &thgr;, z<SUP><UP>−k</UP></SUP>)}, (21)
where i denotes the index over the components of theta  (Eq. 3). Note that the first option generates a realization of z directly from the conditional p(z|theta y) by using a single Gibbs component. The second uses N different components and presents a more complicated correlation structure, since each zk depends on both zk-1 and zk+1, which leads to a slower mixing chain (see Liu et al., 1995, for the effects of correlations among sampled components). However, by adopting the latter option, it is possible to avoid any part of the forward-backward filtering procedures used in the other two cases. This fact is particularly useful in the treatment of dependent observations (R. Rosales, J. A. Stark, W. J. Fitzgerald, and S. B. Hladky, manuscript in preparation).

Priors

In this paper, we consider proper conjugate priors that attempt to be weakly informative on Theta . A prior p(theta ) that belongs to a parametric family F is conjugate to a given likelihood if the resulting posterior is also from this family. In this case, the required conditionals for theta  also belong to F (see, for example, Bernardo and Smith, 1994). This prior has been chosen for tractability. It should be noted that hidden Markov models do not allow the use of improper (noninformative) priors for lambda  or for each row of A (Diebolt and Robert, 1994).

Let ai denote the ith row of A. Then, by assuming independence between priors over states and conditional independence between priors over parameters,
p(&thgr;)=<LIM><OP>∏</OP><LL><UP>i∈𝒞</UP></LL></LIM> p(&lgr;, a<SUB><UP>i</UP></SUB>, q<SUB><UP>i</UP></SUB>, &sfgr;<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>) (22)

=p(&lgr;) <LIM><OP>∏</OP><LL><UP>i∈𝒞</UP></LL></LIM> p(a<SUB><UP>i</UP></SUB>)p(q<SUB><UP>i</UP></SUB>)p(&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>).
In the case of a univariate normal likelihood for eta k and a multinomial for z, the conjugate families for theta  are given by the following densities: lambda  ~ D(b1, ... , bn), ai ~ D(ei1, ... , ein), qi ~ N(mis<UP><SUB>i</SUB><SUP>2</SUP></UP>) and sigma <UP><SUB>i</SUB><SUP>2</SUP></UP> IG(uiwi). IG is used for an inverted gamma density with shape and scale ui, wi and hence with mean wi/(ui - 1) whenever ui > 1 and variance w<UP><SUB>i</SUB><SUP>2</SUP></UP>/[(ui - 1)2(ui - 2)] if ui > 2. The symbol D is used for a Dirichlet density with parameter vector x with either x = b or x = ei. In this case, xj > 0, and the jth mean and variance for the jth component are xj/x0 and xj(x0 - xj)/[x<UP><SUB><IT>0</IT></SUB><SUP><IT>2</IT></SUP></UP>(x0 + 1)] where x0 = Sigma j xj.

The parameters mi, s<UP><SUB>i</SUB><SUP>2</SUP></UP>, ui and wi are regarded as constants that may be calculated according to the observed data range. Let the range extend from 0 to R, with R measured in physical units of current. Then, setting
m<SUB><UP>i</UP></SUB>=R/2, s<SUP><UP>2</UP></SUP><SUB><UP>i</UP></SUB>=R<SUP>2</SUP> (23)
for all i is in  C, tends to produce a prior for qi that is relatively flat over the interval specified by R. The constants ui and wi may be specified by setting the mean of the inverse gamma density equal to the observed variance sigma <UP><SUB>y</SUB><SUP>2</SUP></UP> (if available), and also the variance equal to some reasonable multiple of the data range &Rtilde; (i.e., &Rtilde; = 4R). Solving the resulting system for ui and wi gives
u<SUB><UP>i</UP></SUB>=[(&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>y</UP></SUB>)<SUP>2</SUP>+2<A><AC>R</AC><AC>˜</AC></A>]/<A><AC>R</AC><AC>˜</AC></A> w<SUB><UP>i</UP></SUB>=[(&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>y</UP></SUB>)<SUP>4</SUP>+&sfgr;<SUP><UP>2</UP></SUP><SUB><UP>y</UP></SUB><A><AC>R</AC><AC>˜</AC></A>]/<A><AC>R</AC><AC>˜</AC></A>. (24)
Noninformative priors for the initial density and for the transition probabilities are obtained at the limit when bi right-arrow 0 and eij right-arrow 0. However, care must be taken because bi = eij = 0 for any i, j is in  C results in an improper prior. Sensible choices representing weak prior information are eij is in  [0.1, 1] and bi = 10-6. In terms of the single channel record, the value for eij constitutes a statement about the frequency of the transition i right-arrow j, whereas the choice for bi represents the prior belief that the channel will start at the ith level. Due to the structure induced by the Dirichlet density for each row, informative priors for A obeying reversibility constraints imposed by Kolmogorov's criteria could also be used as those suggested for Q by Ball et al. (1999).

The full conditionals necessary to implement the Gibbs sampler follow immediately once the priors have been specified. Samples for all these densities where generated by implementing Eqs. 14, 18, and 20, and following the methods described in Fishman (1996) and Gelman et al. (1995). The explicit form for p(theta i|theta -i, y, z) is derived in the Appendix.

Convergence

Successive iterations from the sampling scheme at Eq. 15 (or Eq. 21) produce the sequence (theta (1)z(1), ... , (theta (M)z(M)), which is a Markov chain on Theta  × Z. Let pi (theta |y) be the marginal of pi (theta z|y), that is,
&pgr;(&thgr;‖y)=<LIM><OP>∑</OP><LL><UP>z∈Z</UP></LL></LIM> &pgr;(&thgr;, z‖y), (25)
and pi m(theta |y) the density generated at the mth Gibbs step. Let also f(z|y) and fm(z|y), be the corresponding densities for z. Then,
&pgr;<SUP><UP>m</UP></SUP>(&thgr;‖y)=<LIM><OP>∑</OP><LL><UP>z∈Z</UP></LL></LIM> &pgr;(&thgr;‖z, y)f<SUP><UP>m</UP></SUP>(z‖y) (26)
represents a relation between the marginals at each step that enables transfer of the convergence properties of {z(m)} to {theta (m)}. This fact, first elicited in Diebolt and Robert (1994) and Robert (1995), is known under the term "duality principle." Because {z(m)} is a regular discrete state space Markov chain, then it is stationary and its equilibrium distribution f(z|y) is unique, that is, {z(m)} is ergodic. Although {theta (m)} is not a Markov chain, it is straightforward to show that its stationary distribution is the marginal pi (theta |y). In this case, due to Eq. 26, the following results hold:
a.   For any starting value theta (0), the process {theta (m)} converges uniformly to pi (theta |y) at a geometric rate.

b.  For any real function g, provided Epi [|g(theta )|] < infinity , and for any stating point theta (0), Epi m[g(theta )] converges uniformly at a geometric rate to Epi [g(theta )].

c.  Because {z(m)} is a stationary Markov chain, it is also phi-mixing, and hence, so is {theta (m)}. In this case a Central limit Theorem for any real function g(theta ) holds, provided Epi [|g(theta )|2] < infinity .

Proofs for a and b can be found in Theorem 1, (i)-(ii) in Robert et al. (1993). A proof for c is provided by Theorem 1 (iii) in Robert et al. (1993) and Theorem 20.1 in Billingsley (1968). An expression for the rate of convergence in a is outlined in the Appendix.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
ION CHANNEL MODELING
BAYESIAN APPROACH
RESULTS
DISCUSSION
APPENDIX
APPENDIX B
REFERENCES

This section presents results obtained with the Gibbs sampler on synthetic signals of known characteristics. The following example was designed to present a challenge by providing a combination of brief events together with closely located subconductance levels. A four-state Markov chain with reversible cyclic mechanism



View larger version (12K):
[in this window]
[in a new window]
 
Scheme 1  

was considered. The symbols ki, i = 1, ... , 8 are used to denote the transition rate constants among the states. By setting k1 = 6.183, k2 = 0.454, k3 = 2.697, k4 = 1.665, k5 = 0.446, k6 = 13.163, k7 = 0.182, and k8 = 11.812 and the sampling period to delta  = 0.005, the transition probability matrix, A = exp(Qdelta ), becomes
and the expected life times at each state are spread over two orders of magnitude, i.e., tau c1 = 0.0551, tau c2 = 1.351, tau o1 = 0.0629, and tau o2 = 0.5519. This mechanism generates long sojourns in an "open" state or in a "closed" state, which are then interrupted by brief transitions to a state of the opposite type, that is, C2 right-arrow O1 right-arrow C2 or O2 right-arrow C1 right-arrow O2. The conductances were set to qc1 = 0.07, qc2 = 0.0, qo1 = 0.14, and qo2 = 0.21, and white noise of mean 0.0 and standard deviation 0.1 was added to a million-point realization of z. The allowed set of transitions, together with the labels assumed for the conductances, can be seen in Fig. 1 C. The corresponding noisy segment is presented in Fig. 1 A.



View larger version (37K):
[in this window]
[in a new window]
 
FIGURE 1   (A) A segment of the synthetic data generated by the mechanism in the Results section. (B) An ensemble of few sampled realizations, z(m), for m: 22, 24, 25, 47, 49 (×102), for the range marked in (A). (C) The ideal (noiseless, unfiltered) trace underlying the data segment shown in (A) indicating the labeling of the states, C1, C2, O1, O2. (D) The most frequently visited state during the last 1000 iterations of the Gibbs sampler.

These data were analyzed with the Gibbs sampler specified by Eq. 15, more precisely by using stochastic forward-backward updates (Eq. 18) with four conductance levels and 2000 iterations. The priors where specified as: mi = 0.36, s<UP><SUB>i</SUB><SUP>2</SUP></UP> = 0.25, ui = 2, wi = 1, eij = 0.5, and bi = 1 × 10-6, for all ij is in  C. The sampler was started at arbitrary initial values for theta  and z, namely: q<UP><SUB>i</SUB><SUP>(0)</SUP></UP> = 0.36, sigma <UP><SUB>i</SUB><SUP>2(0)</SUP></UP> = 0.5, a<UP><SUB>i=j</SUB><SUP>(0)</SUP></UP> = 0.99, and a<UP><SUB>i≠j</SUB><SUP>(0)</SUP></UP> = 0.003, and lambda <UP><SUB>i</SUB><SUP>(0)</SUP></UP> = 0.25. The initial realization, z(0), was obtained by sampling from the values of A(0) and lambda (0).

General output for z and theta

The traces at Fig. 1 B show detail of a few of the sampled realizations for z. Long-lived sojourns with high signal-to-noise ratios are usually well located, whereas brief instances with poor resolution present some degree of uncertainty. An example of the latter is given by the short sojourns at C1, arising from transitions from O2 such as the one near the sample 447 in Fig. 1 B. Rather than showing all the samples for z, it is desirable to present a summary of their statistical properties. Figure 1 D presents the most frequently visited state at each time point. The actual location of the levels is assigned by taking the ergodic average for the sampled values of q, i.e., by setting g(theta i) = qi in Eq. 13. Different summaries for z(m), such as the Monte Carlo mean ± standard deviation or their Rao-Blackwellized versions (Robert and Casella, 1999) are also possible. The proportion of misclassified points computed from the most frequently visited state is 0.0315, with a large majority of these accounted for by fitted transitions that slightly precede or follow the corresponding transition in the ideal record.

The plots in Fig. 2 show samples of few components of theta  against the number of iterations. It appears that the sampler reaches a stationary state within approximately 100 iterations. Further iterations (see Fig. 2 D, and further >106, not shown) do not seem to diverge from these values. The ergodic averages for some components of theta  obtained from the last 1000 iterations are (standard deviation in parentheses): qc1 = 0.067 (.001), qc2 = -7.44e - 4 (1.9e - 4), qo1 = 0.138 (8.68e - 4), and qo2 = 0.209 (1.43e - 4), the transition probability matrix,
and the variances of the noise sigma <UP><SUB>c<SUB>2</SUB></SUB><SUP>2</SUP></UP> = 0.01 (2.64e - 5), sigma <UP><SUB>c<SUB>1</SUB></SUB><SUP>2</SUP></UP> = 0.01 (1.45e - 4), and sigma <UP><SUB>o<SUB>1</SUB></SUB><SUP>2</SUP></UP> = 0.01 (9.64e - 5), sigma <UP><SUB>o<SUB>2</SUB></SUB><SUP>2</SUP></UP> = 0.01 (1.69e - 5). In general, the sampled realizations that correspond to states or transitions that are less frequent have higher variances. In this case, the uncertainties associated with the short-lived states C1 and O1 are higher.



View larger version (46K):
[in this window]
[in a new window]
 
FIGURE 2   Samples for some components of theta  against iteration number: (A) samples for the level positions q; (B) samples for the variance sigma <UP><SUB>C<SUB>1</SUB></SUB><SUP>2</SUP></UP>; (C) samples for aC2C1 (center), aO1C1 (top), aC1C2 (bottom); (D) samples for aO2C2 (top), aO1O2 (center), aC1O2 (bottom). The total number of iterations in (A), (B), and (C) were 2000, whereas (D) presents a longer run (104) that suggests stationarity for aC1O2.

For comparison, Baum's likelihood maximization was applied using the same theta (0) values as those for the sampler. In this case, the estimates for the theta  components reported above were: qc1 = 0.067, qc2 = 0.003, qo1 = 0.164, qo2 = 0.207, sigma <UP><SUB>c<SUB>2</SUB></SUB><SUP>2</SUP></UP> = 0.01, and sigma <UP><SUB>o<SUB>1</SUB></SUB><SUP>2</SUP></UP> = 0.01, and, for the diagonal of A: ac1c1 = 0.964, ac2c2 = 0.996, ao1o1 = 0.968, and ao2o2 = 0.998.

Kernel estimates

Some properties of the samples theta <UP><SUB>i</SUB><SUP>(b+1)</SUP></UP>, ... , theta <UP><SUB>i</SUB><SUP>(M)</SUP></UP> can be summarized through a kernel density estimate. Denote by E = M - b the effective number of iterations after burn in. This estimator is given by
<A><AC>k</AC><AC>ˆ</AC></A>(x)=<FR><NU>1</NU><DE>E</DE></FR> <LIM><OP>∑</OP><LL><UP>m=b+1</UP></LL><UL><UP>M</UP></UL></LIM> &phgr;{x−&thgr;<SUP>(<UP>m</UP>)</SUP><SUB><UP>i</UP></SUB>, &ohgr;}, (27)
where phi , known as a kernel function, is itself a density usually chosen to be unimodal and symmetric about zero. Here we consider phi  as a normal density with mean 0.0 and standard deviation (bandwidth) omega  > 0. Following Eq. 27, this estimate is constructed by centering a scaled normal density at each sample of theta <UP><SUB>i</SUB><SUP>(m)</SUP></UP>; the actual value at any point x is the average of the M - (b + 1) ordinates at that value. For the level positions, q, this estimate might be used as an alternative to the all-points conductance histogram commonly used in single channel analysis. The estimates for some components of theta  including the ones for qc1 and qc2 are shown in Fig. 3. Just as the bin width affects the appearance of a histogram, the value of omega  affects the shape of the density estimate, with larger values producing smoother estimates. For all the components of theta , this parameter is obtained as omega  = sigma theta [4/(3E)]1/5, where sigma theta denotes the sample standard deviation (see Bowman and Azzalini, 1997, p. 31). Both figures also reflect the fact of higher uncertainty for the parameters associated to short lived sates. In any case, the posterior mode follows closely the true value.



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 3   Kernel density estimates for selected posterior marginals. (A) Estimates for qc2 (continuous, bottom axis) and qc1 (dashes, top axis), both calculated from Eq. 27 with omega  = 0.002. (B) Estimates for ac2c2 (continuous, bottom axis) and aC2O2 (dashes, top axis); both obtained with omega  = 0.0005. The kernels were constructed from the last 1000 realizations.

Dwell time estimation

Single-state dwell times

The kernel density estimates for the dwell times t1, t2, ... at any single state are presented in Fig. 4. These are calculated from m sampled realizations of z by following a form similar to the one in Eq. 27. Denote ti,m the ith sojourn of the mth realization, then the kernel is computed as
<A><AC>k</AC><AC>ˆ</AC></A>(x)=<FR><NU>1</NU><DE>E</DE></FR> <LIM><OP>∑</OP><LL><UP>m=b+1</UP></LL><UL><UP>M</UP></UL></LIM> <FR><NU>1</NU><DE>H<SUB><UP>m</UP></SUB></DE></FR> <LIM><OP>∑</OP><LL><UP>i=1</UP></LL><UL><UP>H<SUB>m</SUB></UP></UL></LIM> &phgr;{x−t<SUP>*</SUP><SUB><UP>i,m</UP></SUB>, &ohgr;<SUB><UP>i</UP></SUB>}, (28)
where Hm is the number of dwell times for a particular state in z(m), and t*i,m = ln(ti,m), is the sequence of log-transformed sojourns. It is useful to use a scale-dependent bandwidth,
&ohgr;<SUB><UP>i</UP></SUB>=<UP>max</UP>{<UP>ln</UP>(1+&dgr;/t<SUB><UP>i,m</UP></SUB>), g}, (29)
where g is the bandwidth or standard deviation of the normal density used for long dwell times, and ln[1 + delta /ti,m] ensures that the bandwidth for short dwell times corresponds to a time uncertainty of at least one sample interval. The choices of bandwidth and the spacing of x values for evaluation of phi  are related. So that each dwell time will add equal weight to the density estimate, g must exceed the spacing. The densities in Fig. 4 have been scaled by the number of detected events or counts to allow comparison between the kernel estimates calculated from output of the Gibbs sampler and from the results of filtering and thresholding. For the Gibbs sampler, the density is multiplied by H = Epi [Hm]. The plot in Fig. 4 A corresponds to the estimates for the time spent at each state, calculated from the noiseless sequence. The plot in Fig. 4 B shows the estimate obtained from the last 20 realizations obtained by the Gibbs sampler. These are in good agreement with the ideal results. The plot in Fig. 4 C shows the estimate obtained by low-pass filtering the data to 5 kHz using a digital Gaussian filter and noting the crossings of thresholds at 0.035, 0.105, and 0.175. Many of the extra counts in the large peaks for states C1 and O1 in Fig. 4 C are artifacts that represent the finite time taken for the filtered conductance to pass through the band of conductances corresponding to these states in the transitions between C2 and O1 and between O2 and C1, respectively. These artifacts render the estimates for C1 and O1 virtually useless. If the cut-off frequency for the low-pass filter could be increased, the artefactual counts would occur for shorter dwell times and thus might be separable from the genuine events. However, increasing the cut-off frequency increases the amplitude of the noise, which increases the frequency of another form of artefactual crossing of the thresholds. Indeed the peaks for the long-lived states C2 and O2 in Fig. 4 C contain more counts than for the ideal trace and are shifted to shorter dwell times because, even after filtering to 5 kHz, the noise produces a large number of additional crossings of the thresholds at 0.035 and 0.175. 



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 4   Kernels for the dwell times for each state obtained: (A) from the ideal trace (see Fig. 1 C); (B) from the posterior estimate produced using the Gibbs sampler; and (C) by low-pass filtering the trace at 5 kHz and thresholding (C). The curves from (A) are echoed in (B) for comparison. Note the different vertical scale in (C). Each trace has been scaled so that it integrates to the number of sojourns in the corresponding state. For the theoretical trace in (A) the counts and time constants were C1: 1622, 0.055; C2: 2858, 1.35; O1: 3675, 0.063; and O2: 2527, 0.552. The bandwidth was calculated from Eq. 29 with g = 0.39. For comparison with the ideal values of tau  given earlier, the mean dwell times in each of the states in (C) are: tau c1 = 0.050, tau c2 = 0.363, tau o1 = 0.052, and tau o2 = 0.438.

Class dwell times

In practice, filter and threshold analysis of the data summarized in Fig. 4 would be attempted using a single threshold leading to two classes of states, O = {O1O2} above the threshold and C = {C1C2} below. To obtain the predictions of the model for these classes, the Q matrix is partitioned into four submatrices Qoo, Qco, Qoc, and Qcc; each with entries equal to the corresponding transition rates (Colquhoun and Hawkes, 1982). Figure 5, A and B, displays the theoretical densities obtained by using the spectral representation of the Q matrix (Colquhoun and Hawkes, 1982, 1995), and least square fits of two log-transformed exponentials, Sigma i (Wi/tau i)exp[x - exp(x)/tau i], i = 1, 2; with x = to or x = tc to the kernel estimates obtained from the Gibbs sampler for the time spent in both classes. These are calculated with Eqs. 28 and 29 after adding together the events corresponding to successive sojourns of each class. The relevant dwell times were taken from the last 20 sampled realizations. In Fig. 5 C the dwell times have been obtained by low-pass filtering the noisy trace with a 5-kHz digital Gaussian filter and noting a change of state whenever the signal crosses a threshold at 0.105, half way between the two middle levels. With this synthetic signal, a large majority of genuine transitions do cross the midline, which makes this an appropriate position for a single threshold. The filter frequency of 5 kHz was chosen to reduce the "false alarms" resulting from noise fluctuations to less than 5% of the total. It is possible to use relatively light filtering and tolerate a relatively high rate of noise transitions when the signal is in a state close to a threshold because these states are occupied for less than 10% of the time and genuine transitions from them are frequent. However, even at 5 kHz, more than a third of the genuine events have been missed. Further low-pass filtering leads t