Hidden Markov models have been used to restore recorded
signals of single ion channels buried in background noise. Parameter estimation and signal restoration are usually carried out through likelihood maximization by using variants of the Baum-Welch
forward-backward procedures. This paper presents an alternative
approach for dealing with this inferential task. The inferences are
made by using a combination of the framework provided by Bayesian
statistics and numerical methods based on Markov chain Monte Carlo
stochastic simulation. The reliability of this approach is tested by
using synthetic signals of known characteristics. The expectations of the model parameters estimated here are close to those calculated using
the Baum-Welch algorithm, but the present methods also yield estimates
of their errors. Comparisons of the results of the Bayesian Markov
Chain Monte Carlo approach with those obtained by filtering and
thresholding demonstrate clearly the superiority of the new methods.
 |
INTRODUCTION |
The statistical analysis of single channel patch
clamp records has been the subject of extensive research for over two
decades. The most common scheme uses filtering and thresholding to
recover the underlying process that is followed by the channel (see
Colquhoun and Sigworth, 1995
, for an extensive review).
As a consequence, the original signal is reduced to a sequence of dwell
times at a finite number of possible conductance levels separated by
the thresholds. Empirical density estimates for the time spent at each
level are then constructed as histograms, which are usually fitted by
exponential mixtures. In the second stage of the analysis, the obtained
mixtures characterized by a set of weights and rate constants are
interpreted by modeling the channel dynamics as a finite state space
aggregated Markov chain which is also homogeneous and has continuous
(time) parameter (Colquhoun and Hawkes, 1982
; Fredkin et al., 1985
; Ball and Sansom,
1989
). The inferences are directed toward the structure and
specific value of the infinitesimal generator (the transition rate
matrix) of the Markov chain. Finally, model selection between various
alternatives is handled by using traditional penalized likelihood
ratios. Ball and Rice (1992)
provide a critical overview
of the inferential difficulties of these approaches.
With only a few exceptions (Fredkin and Rice, 1992b
),
these methods rely heavily on the accuracy of the initial restoration step and are thus complicated when considering signal-to-noise ratios
that are at the limit of filtering and thresholding. The pioneering
work of Chung et al. (1990)
introduced hidden Markov models with the initial aim of extracting information about current amplitudes and channel kinetics at low signal-to-noise ratios. More
recently, hidden Markov models have also been considered by
Venkataramanan et al. (1998)
and Michalek and
Timmer (1999)
extending the applicability of the initial
framework. Parameter estimation and signal restoration are usually
carried out in an effective way by using variants of Baum's
forward-backward procedures and reestimation formulas (Baum et
al., 1970
). In this case, estimates are represented by points
in the parameter space that correspond to the coordinates of a local
maximum of the likelihood.
A radically different approach for the treatment of hidden Markov
models is provided by Bayesian statistics. In this paper, the initial
restoration problem is addressed by modeling the observations with
hidden Markov models. However, inferences are performed by using
Bayesian statistics and an extension to a stochastic simulation method
first proposed by Robert et al. (1993)
. Fredkin
and Rice (1992a)
also considered Bayesian restoration, but
their approach is methodologically closer to Baum's maximization
procedures. Bayesian inference based on stochastic search methods known
as Markov Chain Monte Carlo have also been considered by Ball et al. (1999
, 1997
),
Hodgson (1999)
, and Hodgson and Green
(1999)
, however, the parameterization and their inferences are
directed toward the transition rate matrix associated with a gating
mechanism. The methods developed here, based on simpler procedures, are
computationally efficient and supported by a solid probabilistic basis.
In this article, it is shown that the Bayesian methodology provides a
useful way to deal with hidden Markov models for ion channels. Bayesian
theory provides statistical grounds for the assessment of the
uncertainty in all the unknowns considered in the modeling process. The
next two sections present a brief overview of the problems posed by
hidden Markov models and also their treatment from a Bayesian
perspective. Results for various synthetic signals are shown in the
following section, together with those obtained via standard
Baum-Welch maximization. Comparisons of the Bayesian Markov Chain
Monte Carlo approach against filtering and thresholding clearly
demonstrate the new methods to be superior.
 |
ION CHANNEL MODELING |
Hidden Markov model
This section presents an overview of the hidden Markov model
formalism that will be adopted and the notation to be used. The parameterization is consistent with the one considered by Chung et al. (1990)
or Venkataramanan et al. (1998)
.
A hidden Markov model is defined by two interrelated stochastic
processes. The first of these, representing the ion channel, is a
first-order finite-state homogeneous Markov chain,
{zl: l
+}, on
= {1, 2, ... , n},
with initial density
and transition rate matrix Q. The
continuous process is approximated by a discrete time version,
{zk: k
T}, where
T = {
, 2
, ... ,
N}, with N
as the total number of sample points and
the sampling period of the
acquisition system (hereafter we assume
= 1). In these terms
{zk} is determined by a transition
probability matrix, A = exp(Q
), with
elements aij =
(zk = j|zk
1 = i), i, j
(where
(X|Y) is the probability of
X given Y). The second process is represented by
the set y = {yk: k
T} of independent and identically distributed random variables that constitute the observations. Each observation
yk is assumed to arise as a function of
zk, defined by a conditional density
|
(1)
|
where
is possibly a vector used to denote the parameters
associated with a particular family d. Under standard
normality assumptions, the function of the process is
|
(2)
|
with qi and 
as the
mean and the variance of the ith possible outcome associated
with zk. Assuming the number of conductance
states, n, is known a priori, the model is completely
specified by
|
(3)
|
This suggests an equivalence between states and conductances.
However, this correspondence may be relaxed by imposing constraints on
q such as qj =
for some
value
, and j
J
if
zk = j, as will be shown in the
section Class Dwell Times.
In this setting, each observation arises as the sum of two terms,
|
(4)
|
The first term at the right-hand side of Eq. 4 represents the
amount of current flowing through the channel associated with the
ith conductance level, whereas
k is a normal
distributed random variable with mean 0 and variance 1, representing
the disturbance of the noise inherent in the recording apparatus. The
symbol ~ together with iid is used to denote independent and
identically distributed.
The assumption of independent observations, explicitly stated as
|
(5)
|
is an important simplification that leads to the analytical
treatment of the model presented so far. It should be noted that the
observations yk are indirectly dependent on each
other through the Markovian structure induced conditionally on
di(yk) by the underlying
process zk.
Maximum likelihood approach
Let Z = {z: zk = i;
k
T, i
} be the space of events constituted
by the path realizations of the chain {zk},
and z one of its elements, i.e., z constitutes a
particular sequence of conductance states and Z the set of
all possible sequences. Then, following Eq. 1 and the Markov property
on z, the likelihood, which equals the probability of the
observations conditional on the parameters, L(
) = p(y|
), is given by
|
(6)
|
with i1, i2, ... ,
iN
. To simplify notation, let
|
(7)
|
then
|
(8)
|
Statistical inferences concerning
generally take the form of
point estimates,
*, obtained at a local maximum of
L(
),
|
(9)
|
with
usually a compact subset of some Euclidean space
d, more precisely
=
n × 
× [0, 1]n×n+1. The notation
"argmax{L( )}" refers to the point in
where
L( ) attains a (local) maximum. In the particular case when
d is normal (Eq. 2) and z is known, the required
maximization is analytic. However, in patch clamp recordings,
z is unknown and inferences are directed toward both
(z,
). In this case, the maximization has to be
performed numerically by considering nN terms at
Eq. 8. Evaluation of L(
) becomes a critical issue, especially in the case of patch clamp records suitable for stationary analysis where N > 106. In the general
framework, this problem was first considered by Baum et al.
(1970)
, who derived an efficient solution reducing the
evaluation to O(n2N). Baum's methods, known as
the forward-backward algorithm and reestimation formulas, were first
applied to patch clamp records by Chung et al. (1990)
.
 |
BAYESIAN APPROACH |
Bayesian analysis proceeds by considering the parameters
as
random variables on
d. Inferences in this case
require the definition of a joint density p(y,
), which
can be decomposed as L(
)p(
), simply by following the
definition of conditional probability. The density p(
),
known as the prior, represents our previous knowledge or belief about the value of the parameters before the observations have been examined.
Once y is available, the uncertainties on
are
represented by a posterior density,
(
|y), which
follows from p(y,
) and the application of Bayes Theorem
as
|
(10)
|
Let 
denote mathematical expectation with
respect to the posterior and g any (square integrable)
function of
. Formal statistical inferences in this context often
take the form of integrals
|
(11)
|
and are thus based on the support of
(
|y)
rather than on a single point estimate. This ensures the use of
probabilities for events
C (C
), in
contrast to classical confidence procedures. An account of Bayesian
theory can be found in Bernardo and Smith (1994)
.
Although there are clear theoretical advantages of the Bayesian
approach, there are difficulties that must be overcome before they can
be realized in practice. The integrals in Eqs. 10 and 11 are not
analytical for the current model. This fact arises as a consequence of
the structure imposed by the likelihood in Eq. 8, regardless of the
specific form of g(
) and p(
).
Markov chain Monte Carlo for hidden Markov models
A solution to the integration task posed by Eqs. 10 and 11 is
provided by Monte Carlo approximation together with Markov chain sampling. First, an ergodic Markov chain
(0),
(1), ... , is constructed on
, such that its
invariant distribution corresponds to the hidden Markov model posterior
(
|y). It should be made clear that this process
represents a different Markov chain than that followed by the channel
on Z.
More precisely, let K
m =
(
(m)
B|
(m
1) = x) denote the
step transition probability from
(m
1) to
(m) for any m = 1, 2, ... , and
0 =
(
(0)
A)
an initial density for A, B
, and x
. Under mild regularity conditions that ensure ergodicity
|
(12)
|
for almost all initial values of
(0). Here
K
=
(
(m)
B|
(0) = x) denotes the mth
iterate of K
, that is,
K
= K
mK
, and
d is used to denote convergence in distribution (see
Billingsley, 1968
). Thus, after an initial relaxation
period b, successive iterations generate a sequence of
samples
(b+1), ... ,
(M) that are
approximately distributed according to
(
|y). This sequence may then be used to calculate averages that approximate the
desired expectations under the posterior target density
|
(13)
|
constituting a Monte Carlo estimate of the integrals in Eq. 11.
The combination of these techniques has been used to tackle complex
multidimensional problems, proving to be an effective tool where
standard frequentist methods have failed. The theoretical background
needed for the use of these general state chains is described in
Nummelin (1984)
or Meyn and Tweedie
(1993)
; whereas their application to Bayesian statistics is
presented among others by Tierney (1994)
, Besag
et al. (1995)
, or Robert and Casella (1999)
.
J. A. Stark, R. Rosales, W. J. Fitzgerald, and S. B. Hladky (manuscript in preparation) also present a different Markov
chain Monte Carlo approach for the single channel restoration problem in terms of a change point model. The particular chain that will be
used to generate approximate samples from the posterior in Eq. 10 is
presented in detail in the next section.
Gibbs sampler
Let p(
i|
i,
y) be the (full) conditional densities for each of the
components
i, given values of the other components 
i = {
j: j
i}. These densities are uniquely determined by a particular
posterior (Besag, 1974
) and constitute the building blocks of a Markov chain Monte Carlo method known as the Gibbs sampler
(Gelfand and Smith, 1990
). This sampler proceeds as
follows. Given an arbitrary set of starting values
(0) = (
, ... ,

), draw a sample of 
from
p(
1|
, y), i.e.,
then
|
(14)
|
This completes one iteration of the sampling scheme
~ {p(
i|
i, y)},
and also a transition from
(0) to
(1).
The sampling of a high dimensional vector
has been replaced by the
sampling of lower dimensional components, which is one of the key
features of this algorithm.
The decomposition of
(
|y) into its conditionals is
rather involved because of the nN terms in Eq. 8. Consider instead the joint
(
, z|y), which leads to the following decomposition and sampling scheme:
|
(15)
|
In this case, under the appropriate choice of priors, all the
densities p(
i|
i, y,
z) are analytical because they only involve one of the possible
realizations in Z. This simplification was first considered
by Robert et al. (1993)
, and by Diebolt and Robert (1994)
in the general mixture distribution context. The key of this conditional decomposition resides in considering
z as a random variable, much in the same way as
, instead
of integrating it out. This scheme leads to rigorous theoretical facts
on the convergence of the sampler to be discussed in the section
entitled Convergence.
Following Eq. 15, draws have to be produced from the full conditional
for the hidden process
|
(16)
|
for any i1, ..., iN
1,
iN
. This can be achieved in
different ways. A first alternative consists of the method proposed by
Carter and Kohn (1994)
, which constitutes a stochastic
version of the forward-backward algorithm. This follows by noting that p(z|y,
) can be decomposed as
|
(17)
|
with yk = y1,
... , yk. Given zk+1,
p(zk|zk+1,
yk,
) is a discrete distribution, which
suggests the following sampling strategy. For k = 2, ... , N and i
, compute and store the
optimal filter p(zk = i|yk,
), then sample
zN from
p(zN|y,
), and, for
k = N
1, ... , 1, sample
zk from
p(zk|zk+1,
yk,
), where, if
zk+1 = l, for i
,
|
(18)
|
This strategy updates (
, z) by following the
decomposition given by Eq. 15. A second option follows by considering
the full conditional densities
|
(19)
|
which follow from the Markov property on z. For
k = 2, ... , N
1 and for any i,
j, r
, these densities are
and, for k = N,
|
(20)
|
This sampling scheme leads to the conditional decomposition,
|
(21)
|
where i denotes the index over the components of
(Eq. 3). Note that the first option generates a realization of
z directly from the conditional p(z|
, y) by
using a single Gibbs component. The second uses N different
components and presents a more complicated correlation structure, since
each zk depends on both
zk
1 and zk+1, which
leads to a slower mixing chain (see Liu et al., 1995
, for the effects of correlations among sampled components). However, by
adopting the latter option, it is possible to avoid any part of the
forward-backward filtering procedures used in the other two cases.
This fact is particularly useful in the treatment of dependent
observations (R. Rosales, J. A. Stark, W. J. Fitzgerald, and
S. B. Hladky, manuscript in preparation).
Priors
In this paper, we consider proper conjugate priors that attempt to
be weakly informative on
. A prior p(
) that belongs to a parametric family
is conjugate to a given likelihood if the resulting posterior is also from this family. In this case, the required conditionals for
also belong to
(see, for example, Bernardo and Smith, 1994
). This prior has been chosen
for tractability. It should be noted that hidden Markov models do not
allow the use of improper (noninformative) priors for
or for each
row of A (Diebolt and Robert, 1994
).
Let ai denote the ith row of A. Then,
by assuming independence between priors over states and conditional
independence between priors over parameters,
|
(22)
|
In the case of a univariate normal likelihood for
k
and a multinomial for z, the conjugate families for
are
given by the following densities:
~ D(b1,
... , bn), ai ~ D(ei1, ... , ein), qi ~ N(mi, s
) and

~ IG(ui, wi).
IG is used for an inverted gamma density with shape and
scale ui, wi and hence
with mean wi/(ui
1) whenever ui > 1 and variance
w
/[(ui
1)2(ui
2)] if
ui > 2. The symbol D is used
for a Dirichlet density with parameter vector x with either
x = b or x = ei. In this
case, xj > 0, and the jth mean
and variance for the jth component are xj/x0 and
xj(x0
xj)/[x
(x0 + 1)] where x0 =
j xj.
The parameters mi,
s
, ui and
wi are regarded as constants that may be
calculated according to the observed data range. Let the range extend
from 0 to R, with R measured in physical units of
current. Then, setting
|
(23)
|
for all i
, tends to produce a prior for
qi that is relatively flat over the interval
specified by R. The constants ui and
wi may be specified by setting the mean of the
inverse gamma density equal to the observed variance

(if available), and also the variance equal to
some reasonable multiple of the data range
(i.e.,
= 4R). Solving the resulting system for
ui and wi gives
|
(24)
|
Noninformative priors for the initial density and for the
transition probabilities are obtained at the limit when
bi
0 and eij
0. However, care must be taken because bi = eij = 0 for any i, j
results in an improper prior. Sensible choices representing
weak prior information are eij
[0.1, 1] and bi = 10
6. In
terms of the single channel record, the value for
eij constitutes a statement about the frequency
of the transition i
j, whereas the choice for
bi represents the prior belief that the channel will start at the ith level. Due to the structure induced by
the Dirichlet density for each row, informative priors for A
obeying reversibility constraints imposed by Kolmogorov's criteria
could also be used as those suggested for Q by Ball
et al. (1999)
.
The full conditionals necessary to implement the Gibbs sampler follow
immediately once the priors have been specified. Samples for all these
densities where generated by implementing Eqs. 14, 18, and 20, and
following the methods described in Fishman (1996)
and
Gelman et al. (1995)
. The explicit form for
p(
i|
i, y, z) is
derived in the Appendix.
Convergence
Successive iterations from the sampling scheme at Eq. 15 (or Eq. 21) produce the sequence (
(1), z(1),
... , (
(M), z(M)),
which is a Markov chain on
× Z. Let
(
|y) be the marginal of
(
, z|y),
that is,
|
(25)
|
and
m(
|y) the density generated at
the mth Gibbs step. Let also f(z|y) and
fm(z|y), be the corresponding
densities for z. Then,
|
(26)
|
represents a relation between the marginals at each step that
enables transfer of the convergence properties of
{z(m)} to {
(m)}. This
fact, first elicited in Diebolt and Robert (1994)
and Robert (1995)
, is known under the term "duality
principle." Because {z(m)} is a regular
discrete state space Markov chain, then it is stationary and its
equilibrium distribution f(z|y) is unique, that is,
{z(m)} is ergodic. Although
{
(m)} is not a Markov chain, it is straightforward
to show that its stationary distribution is the marginal
(
|y). In this case, due to Eq. 26, the following
results hold:
| a. |
For any starting value (0), the process { (m)} converges uniformly to ( |y) at a geometric rate.
|
b. For any real function g, provided

[|g(
)|] <
, and for any
stating point
(0),

m[g(
)] converges uniformly at a
geometric rate to 
[g(
)].
c. Because {z(m)} is a stationary
Markov chain, it is also
-mixing, and hence, so is
{
(m)}. In this case a Central limit Theorem for any
real function g(
) holds, provided

[|g(
)|2] <
.
Proofs for a and b can be found in Theorem 1, (i)-(ii) in
Robert et al. (1993)
. A proof for c is provided by
Theorem 1 (iii) in Robert et al. (1993)
and Theorem
20.1 in Billingsley (1968)
. An expression for the rate
of convergence in a is outlined in the Appendix.
 |
RESULTS |
This section presents results obtained with the Gibbs sampler on
synthetic signals of known characteristics. The following example was
designed to present a challenge by providing a combination of brief
events together with closely located subconductance levels. A
four-state Markov chain with reversible cyclic
mechanism
was considered. The symbols ki,
i = 1, ... , 8 are used to denote the transition
rate constants among the states. By setting k1 = 6.183, k2 = 0.454, k3 = 2.697, k4 = 1.665, k5 = 0.446, k6 = 13.163, k7 = 0.182, and
k8 = 11.812 and the sampling period to
= 0.005, the transition probability matrix, A = exp(Q
), becomes
and the expected life times at each state are spread over two
orders of magnitude, i.e.,
c1 = 0.0551,
c2 = 1.351,
o1 = 0.0629, and
o2 = 0.5519. This mechanism
generates long sojourns in an "open" state or in a "closed"
state, which are then interrupted by brief transitions to a state of
the opposite type, that is, C2
O1
C2 or O2
C1
O2. The conductances were set to
qc1 = 0.07, qc2 = 0.0, qo1 = 0.14, and
qo2 = 0.21, and white noise of mean
0.0 and standard deviation 0.1 was added to a million-point realization
of z. The allowed set of transitions, together with the
labels assumed for the conductances, can be seen in Fig.
1 C. The corresponding noisy
segment is presented in Fig. 1 A.

View larger version (37K):
[in this window]
[in a new window]
|
FIGURE 1
(A) A segment of the synthetic data
generated by the mechanism in the Results section. (B) An
ensemble of few sampled realizations, z(m), for
m: 22, 24, 25, 47, 49 (×102), for the range
marked in (A). (C) The ideal (noiseless, unfiltered) trace
underlying the data segment shown in (A) indicating the
labeling of the states, C1, C2,
O1, O2. (D) The most frequently visited
state during the last 1000 iterations of the Gibbs sampler.
|
|
These data were analyzed with the Gibbs sampler specified by Eq. 15,
more precisely by using stochastic forward-backward updates (Eq. 18)
with four conductance levels and 2000 iterations. The priors where
specified as: mi = 0.36, s
= 0.25, ui = 2, wi = 1, eij = 0.5, and
bi = 1 × 10
6, for all
i, j
. The sampler was started at arbitrary
initial values for
and z, namely:
q
= 0.36, 
= 0.5, a
= 0.99, and
a
= 0.003, and

= 0.25. The initial realization,
z(0), was obtained by sampling from the values
of A(0) and
(0).
General output for z and
The traces at Fig. 1 B show detail of a few of the
sampled realizations for z. Long-lived sojourns with high
signal-to-noise ratios are usually well located, whereas brief
instances with poor resolution present some degree of uncertainty. An
example of the latter is given by the short sojourns at
C1, arising from transitions from
O2 such as the one near the sample 447 in Fig. 1 B. Rather than showing all the samples for z,
it is desirable to present a summary of their statistical properties.
Figure 1 D presents the most frequently visited state at
each time point. The actual location of the levels is assigned by
taking the ergodic average for the sampled values of q,
i.e., by setting g(
i) = qi in Eq. 13. Different summaries for
z(m), such as the Monte Carlo mean ± standard deviation or their Rao-Blackwellized versions (Robert
and Casella, 1999
) are also possible. The proportion of
misclassified points computed from the most frequently visited state is
0.0315, with a large majority of these accounted for by fitted
transitions that slightly precede or follow the corresponding transition in the ideal record.
The plots in Fig. 2 show samples of few
components of
against the number of iterations. It appears that the
sampler reaches a stationary state within approximately 100 iterations.
Further iterations (see Fig. 2 D, and further
>106, not shown) do not seem to diverge from these values.
The ergodic averages for some components of
obtained from the last
1000 iterations are (standard deviation in parentheses):
qc1 = 0.067 (.001),
qc2 =
7.44e
4 (1.9e
4), qo1 = 0.138 (8.68e
4), and qo2 = 0.209 (1.43e
4), the transition probability
matrix,
and the variances of the noise 
= 0.01 (2.64e
5),

= 0.01 (1.45e
4), and 
= 0.01 (9.64e
5), 
= 0.01 (1.69e
5). In general, the sampled realizations that correspond to states or transitions that are less frequent have higher
variances. In this case, the uncertainties associated with the
short-lived states C1 and
O1 are higher.

View larger version (46K):
[in this window]
[in a new window]
|
FIGURE 2
Samples for some components of against iteration
number: (A) samples for the level positions q;
(B) samples for the variance  ;
(C) samples for
aC2C1 (center),
aO1C1 (top),
aC1C2 (bottom);
(D) samples for
aO2C2 (top),
aO1O2 (center),
aC1O2 (bottom).
The total number of iterations in (A), (B), and
(C) were 2000, whereas (D) presents a longer run
(104) that suggests stationarity for
aC1O2.
|
|
For comparison, Baum's likelihood maximization was applied using the
same
(0) values as those for the sampler. In this case,
the estimates for the
components reported above were:
qc1 = 0.067, qc2 = 0.003, qo1 = 0.164, qo2 = 0.207, 
= 0.01, and 
= 0.01, and, for the diagonal of
A: ac1c1 = 0.964, ac2c2 = 0.996, ao1o1 = 0.968, and
ao2o2 = 0.998.
Kernel estimates
Some properties of the samples 
,
... , 
can be summarized through a kernel
density estimate. Denote by E = M
b the
effective number of iterations after burn in. This estimator is given
by
|
(27)
|
where
, known as a kernel function, is itself a density usually
chosen to be unimodal and symmetric about zero. Here we consider
as
a normal density with mean 0.0 and standard deviation (bandwidth)
> 0. Following Eq. 27, this estimate is constructed by
centering a scaled normal density at each sample of

; the actual value at any point x is
the average of the M
(b + 1) ordinates at that
value. For the level positions, q, this estimate might be
used as an alternative to the all-points conductance histogram commonly
used in single channel analysis. The estimates for some components of
including the ones for qc1 and
qc2 are shown in Fig.
3. Just as the bin width affects the
appearance of a histogram, the value of
affects the shape of the
density estimate, with larger values producing smoother estimates. For all the components of
, this parameter is obtained as
= 
[4/(3E)]1/5, where

denotes the sample standard deviation (see
Bowman and Azzalini, 1997
, p. 31). Both figures also
reflect the fact of higher uncertainty for the parameters associated to
short lived sates. In any case, the posterior mode follows closely the
true value.

View larger version (18K):
[in this window]
[in a new window]
|
FIGURE 3
Kernel density estimates for selected posterior
marginals. (A) Estimates for qc2
(continuous, bottom axis) and
qc1 (dashes, top
axis), both calculated from Eq. 27 with = 0.002. (B) Estimates for
ac2c2
(continuous, bottom axis) and
aC2O2 (dashes,
top axis); both obtained with = 0.0005. The kernels
were constructed from the last 1000 realizations.
|
|
Dwell time estimation
Single-state dwell times
The kernel density estimates for the dwell times
t1, t2, ... at any single state
are presented in Fig. 4. These are
calculated from m sampled realizations of z by
following a form similar to the one in Eq. 27. Denote
ti,m the ith sojourn of the
mth realization, then the kernel is computed as
|
(28)
|
where Hm is the number of dwell times for a
particular state in z(m), and
t*i,m = ln(ti,m), is the sequence of
log-transformed sojourns. It is useful to use a scale-dependent
bandwidth,
|
(29)
|
where g is the bandwidth or standard deviation of the
normal density used for long dwell times, and ln[1 +
/ti,m] ensures that the bandwidth for
short dwell times corresponds to a time uncertainty of at least one
sample interval. The choices of bandwidth and the spacing of
x values for evaluation of
are related. So that each
dwell time will add equal weight to the density estimate, g
must exceed the spacing. The densities in Fig. 4 have been scaled by
the number of detected events or counts to allow comparison between the
kernel estimates calculated from output of the Gibbs sampler and from
the results of filtering and thresholding. For the Gibbs sampler, the
density is multiplied by H = 
[Hm]. The plot in Fig.
4 A corresponds to the estimates for the time spent at each
state, calculated from the noiseless sequence. The plot in Fig.
4 B shows the estimate obtained from the last 20 realizations obtained by the Gibbs sampler. These are in good agreement
with the ideal results. The plot in Fig. 4 C shows the estimate obtained by low-pass filtering the data to 5 kHz using a
digital Gaussian filter and noting the crossings of thresholds at
0.035, 0.105, and 0.175. Many of the extra counts in the large peaks
for states C1 and O1 in
Fig. 4 C are artifacts that represent the finite time taken
for the filtered conductance to pass through the band of conductances
corresponding to these states in the transitions between
C2 and O1 and between
O2 and C1, respectively. These artifacts render the estimates for C1 and
O1 virtually useless. If the cut-off frequency
for the low-pass filter could be increased, the artefactual counts
would occur for shorter dwell times and thus might be separable from
the genuine events. However, increasing the cut-off frequency increases
the amplitude of the noise, which increases the frequency of another
form of artefactual crossing of the thresholds. Indeed the peaks for
the long-lived states C2 and
O2 in Fig. 4 C contain more counts
than for the ideal trace and are shifted to shorter dwell times
because, even after filtering to 5 kHz, the noise produces a large
number of additional crossings of the thresholds at 0.035 and 0.175.

View larger version (18K):
[in this window]
[in a new window]
|
FIGURE 4
Kernels for the dwell times for each state obtained:
(A) from the ideal trace (see Fig. 1 C); (B)
from the posterior estimate produced using the Gibbs sampler; and
(C) by low-pass filtering the trace at 5 kHz and
thresholding (C). The curves from (A) are echoed
in (B) for comparison. Note the different vertical scale in
(C). Each trace has been scaled so that it integrates to the
number of sojourns in the corresponding state. For the theoretical
trace in (A) the counts and time constants were
C1: 1622, 0.055; C2:
2858, 1.35; O1: 3675, 0.063; and
O2: 2527, 0.552. The bandwidth was calculated
from Eq. 29 with g = 0.39. For comparison with the
ideal values of given earlier, the mean dwell times in each of the
states in (C) are: c1 = 0.050, c2 = 0.363, o1 = 0.052, and o2 = 0.438.
|
|
Class dwell times
In practice, filter and threshold analysis of the data summarized
in Fig. 4 would be attempted using a single threshold leading to two
classes of states, O = {O1, O2} above the threshold and C = {C1, C2} below. To
obtain the predictions of the model for these classes, the Q
matrix is partitioned into four submatrices Qoo,
Qco, Qoc, and
Qcc; each with entries equal to the
corresponding transition rates (Colquhoun and Hawkes,
1982
). Figure
5, A and B, displays the theoretical densities obtained by using the
spectral representation of the Q matrix (Colquhoun
and Hawkes, 1982
, 1995
), and least square fits of two log-transformed
exponentials,
i (Wi/
i)exp[x
exp(x)/
i], i = 1, 2;
with x = to or x = tc to the kernel estimates obtained from the Gibbs
sampler for the time spent in both classes. These are calculated with
Eqs. 28 and 29 after adding together the events corresponding to
successive sojourns of each class. The relevant dwell times were taken
from the last 20 sampled realizations. In Fig. 5 C the
dwell times have been obtained by low-pass filtering the noisy trace
with a 5-kHz digital Gaussian filter and noting a change of state
whenever the signal crosses a threshold at 0.105, half way between the
two middle levels. With this synthetic signal, a large majority of
genuine transitions do cross the midline, which makes this an
appropriate position for a single threshold. The filter frequency of 5 kHz was chosen to reduce the "false alarms" resulting from noise
fluctuations to less than 5% of the total. It is possible to use
relatively light filtering and tolerate a relatively high rate of noise
transitions when the signal is in a state close to a threshold because
these states are occupied for less than 10% of the time and genuine transitions from them are frequent. However, even at 5 kHz, more than a
third of the genuine events have been missed. Further low-pass filtering leads t