help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Originally published as Biophys J. BioFAST on January 28, 2005.
doi:10.1529/biophysj.104.053256
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
biophysj.104.053256v1
88/4/2494    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Milescu, L. S.
Right arrow Articles by Sachs, F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Milescu, L. S.
Right arrow Articles by Sachs, F.
Biophysical Journal 88:2494-2515 (2005)
© 2005 The Biophysical Society

Maximum Likelihood Estimation of Ion Channel Kinetics from Macroscopic Currents

Lorin S. Milescu *, Gustav Akk {dagger} and Frederick Sachs *

* Department of Physiology and Biophysics, State University of New York, Buffalo, New York; and {dagger} Department of Anesthesiology, Washington University in St. Louis, St. Louis, Missouri

Correspondence: Address reprint requests to Frederick Sachs, E-mail: sachs{at}buffalo.edu.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND ALGORITHM
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We describe a maximum likelihood method for direct estimation of rate constants from macroscopic ion channel data for kinetic models of arbitrary size and topology. The number of channels in the preparation, and the mean and standard deviation of the unitary current can be estimated, and a priori constraints can be imposed on rate constants. The method allows for arbitrary stimulation protocols, including stimuli with finite rise time, trains of ligand or voltage steps, and global fitting across different experimental conditions. The initial state occupancies can be optimized from the fit kinetics. Utilizing arbitrary stimulation protocols and using the mean and the variance of the current reduce or eliminate problems of model identifiability (Kienker, 1989Go). The algorithm is faster than a recent method that uses the full autocovariance matrix (Celentano and Hawkes, 2004Go), in part due to the analytical calculation of the likelihood gradients. We tested the method with simulated data and with real macroscopic currents from acetylcholine receptors, elicited in response to brief pulses of carbachol. Given appropriate stimulation protocols, our method chose a reasonable model size and topology.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND ALGORITHM
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Ion channels control the flow of ions through cell membranes. These molecular gates exist in a continuum of conformations, but in practice these are reducible to a small number of kinetic states that live long enough to be resolved experimentally. Some of the states do not conduct ions (closed), whereas others are partially or fully conducting (open). "Instantaneous" transitions occur between these conformations, with the probability of transition described by a rate constant. The rate constant contains two intrinsic terms that describe the coupling of the rates to driving forces, such as ligand concentration, transmembrane voltage, tension, etc. (Hille, 2001Go).

The objective of kinetic analysis is to create a model—a hypothesis—with the minimum number of kinetic states and connections that can describe the data. Successful models are wonderful summaries of the data, and in that simplification help us to understand the structure and function of individual ion channels, and their mean behavior in cells. Hodgkin and Huxley provided the earliest prototype of this kind of analysis (Hodgkin and Huxley, 1952Go). The rate constants quantify the free energy difference between conformational states, providing insight into molecular structure and gating mechanisms. At the cellular level, the differential expression of ion channels, each with specific kinetic and conductance properties and modulation inputs, determine the static and dynamic membrane properties, and ultimately determine the network behavior of excitable cells.

Most properties of ion channels are best studied with single channel data (Neher and Sakmann, 1992Go), but technical difficulties often restrict recordings to samples containing multiple channels that cannot be resolved into unitary events. The statistical description of single channel data is well developed thanks to the work of many clever scientists (e.g., Ball and Rice, 1992Go; Ball and Sansom, 1989Go; Colquhoun and Hawkes, 1982Go, 1995aGo; Colquhoun and Sigworth, 1995Go; Horn and Lange, 1983Go; Kienker, 1989Go), and sophisticated analysis techniques based on Markov models have been published (e.g., Colquhoun et al., 1996Go; Qin et al., 1996Go, 2000Go). The statistical properties of macroscopic currents generated by ensembles of ion channels are well known, but the detailed kinetics are blurred by the overlap of many independent channels (Colquhoun and Hawkes, 1977Go). The time dependence (relaxation) of the current It of a model with NS states, elicited by a step change in experimental conditions, is a sum of NS – 1 exponentials (unless there are uncommon degenerate solutions), regardless of how states are connected (i.e., regardless of model topology; Colquhoun and Hawkes, 1995bGo):

(1)

The equilibrium current Ieq, the amplitudes Ai, and the time constants {tau}i are not independent quantities because they are all functions of the same rate constants. A traditional method of macroscopic data analysis is to fit exponentials to nonstationary currents, with the main objective of estimating the time constants {tau}i. However, the time constants thus obtained are estimated as independent parameters, without imposing the constraints determined by the particular topology of the model, or other constraints, such as loop balance. Hence, the topology of the model remains ambiguous. Rate constants cannot be easily calculated from time constants, and, without molecular rate constants, clues to molecular structure are obscure. Despite these shortcomings, exponential fitting remains the most widely used analytical method; partly for its intuitive feel, partly because of the illusion of model independence, and partly because few procedures for direct estimation of rate constants are available (e.g., d'Alcantara et al., 2002Go; Celentano and Hawkes, 2004Go).

A fundamental problem of kinetic modeling is the identifiability of the model, i.e., are there multiple topologies that are equally likely? In a now-classic example (Kienker, 1989Go), the model C-C-O (Closed-Closed-Open) cannot be distinguished from the model C-O-C, under equilibrium conditions. However, if one or more of the rates is stimulus dependent, they can be distinguished with nonstationary stimuli. Variable stimuli provide higher degrees of model discrimination.

We present here a maximum likelihood method for direct estimation of molecular rate constants from macroscopic ion channel data, using hidden Markov models. The algorithm can estimate the number of channels in the preparation, and the mean and standard deviation of a binary unitary current. The principle is to formulate the nonstationary macroscopic current It as a random variable with a Gaussian probability distribution, and to maximize the likelihood of the data with respect to the estimated parameters. For a given model, the mean µt and variance Vt of the macroscopic current It are functions of the rate constants, channel count, unitary conductance, and experimental protocol. These functions can be complicated, and involve calculation of the exponential of the rate matrix Q, as shown in Theory. The method is "direct" because the a priori assumptions about the topological and other constraints are explicitly specified in the model. We make two key assumptions: i), the ion channels are identical and independent; and ii), the experimental variables (such as ligand concentration, voltage, etc.) are not functions of channel activity. However, environmental variables need not be constant. We approximated continuously changing stimuli with a series of step functions. Although voltage clamp experiments generally satisfy these requirements, concentration and pressure jumps with significant rise time can be well approximated.

The algorithm is quite general, and has a number of features to assist users in model building: i), It works with models of arbitrary size and topology because the structural equations are internally formulated in terms of matrices and vectors. Models can be quantitatively compared, because distinct topologies with different constraints typically have different likelihoods. The modulation of individual state transitions by driving forces can be naturally studied. ii), Constraints can be imposed on rate constants. These constraints may be a priori knowledge of rate values, imposition of identical ligand-binding sites, microscopic reversibility, etc. The constraints reduce the number of free parameters, permitting exploration of larger allosteric models. iii), The current is modeled as a random variable, i.e., both the mean and the variance are used, which tends to reduce the problem of identifiability, and permits a simultaneous estimate of the mean single channel conductance. iv), The algorithm can estimate the number of active channels in the patch, if the single channel amplitude is known or coestimated. v), It works with nonstationary currents elicited in response to arbitrary experimental protocols, allowing for stimuli with finite rise times. A single model with driven rates can be globally fit across multiple data sets, with arbitrary stimulation protocols, and data from the same or from different patches can be combined in a single analysis. vi), The algorithm can analyze a sequence of stimulus steps (e.g., voltage) without assuming that channels reach equilibrium between steps. This reduces the amount of data required to solve the kinetics, and improves identifiability. vii), The starting probabilities (the state occupancies during a conditioning stimulus) are calculated from the fit kinetics and the experimental conditions. Hence, channels need not be assumed to occupy an arbitrary starting state. viii), Finally, the method efficiently calculates the likelihood function. The algorithm is faster than curve fitting using traditional differential equation solvers (d'Alcantara et al., 2002Go). Because the program utilizes only the diagonal elements of the autocovariance matrix, it is much faster then a recent method that uses the full autocovariance matrix (Celentano and Hawkes, 2004Go), and hence can handle larger data sets (although with slightly compromised precision). The gradients of the likelihood function are calculated analytically, enabling use of a fast variable metric optimizer (Fletcher and Powell, 1963Go; Fletcher, 1981Go; Press et al., 1992Go), with quadratic convergence.

This article focuses on the algorithm, its basic properties, and the results of extensive testing on simulated data. We concentrated on testing two ligand-dependent models (C-C-O and C-O-C), as the minimum nontrivial kinetic schemes that capture the behavior of more realistic models. As pointed out above, these simple models have identifiability issues: they are not distinguishable with equilibrium data (Kienker, 1989Go). However, they can be resolved by applying a suitable stimulation protocol.

We evaluated the two most important properties of the algorithm: the accuracy and the precision, and studied the effect of experimental factors, such as the number of channels in the patch, the amount of data, the stimulation protocol, etc. Because the kinetics of ion channels are rarely as simple as our three-state models, and the noise models of experimental data may depart significantly from the Gaussian assumption, we tested the algorithm with actual macroscopic currents from cells expressing acetylcholine receptors (AChRs). The AChR has a relatively complex kinetics, but the equilibrium rates are known in detail from single channel analysis (e.g., Akk and Auerbach, 1999Go). We showed that our method can choose the "correct" model size and topology based on differences in the goodness of fit. Because some of the time constants of AChR kinetics are quite short relative to the rise time of concentration jumps practically obtained, we explored the effect of finite rise time. In future work, we will apply the method to voltage-gated ion channels. The algorithm was implemented in software and integrated in the QuB electrophysiology suite, available for free at www.qub.buffalo.edu.


    THEORY AND ALGORITHM
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND ALGORITHM
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Kinetic model
We review here some basic results of ion channel kinetics, for an arbitrary Markov model with NS conformational states. Rate constants are conventionally expressed as an NS x NS rate matrix Q (also termed the "generator" matrix; Colquhoun and Hawkes, 1995bGo). The Q matrix has the properties that each off-diagonal element qij is equal to the rate constant of the transition between states i -> j, and that each diagonal element qii is equal to the negative sum of the off-diagonal elements of row i, so that the sum of each row is zero. When there is no direct transition between states i -> j, qij = 0. In Eyring-type barrier models, the rate constant qij includes a preexponential factor multiplied by the concentration Cij of some ligand, and an exponential factor multiplied by the magnitude of the driving force Vij (such as voltage, pressure, etc.; Hille, 2001Go):

(2)

The expression in Eq. 2 is equivalent to the common formulation where

At any given time, a state probability vector P, of dimension NS, represents the occupancy of the NS conformational states. The Kolmogorov differential equation describes the dynamics of a Markov process (Elliott et al., 1995Go):

(3)

The superscript T denotes vector transposition. If P0 is the state probability vector at time t0 < t, then Pt is given by the Chapman-Kolmogorov solution:

(4)
where is an NS x NS matrix, known as the transition probability matrix. Each element aij is equal to the conditional probability that the channel is in state j at time t, given that it was in state i at time 0. If the eigenvalues of the Q matrix are all distinct (and most ion channel models have distinct eigenvalues, with one being zero and representing the equilibrium solution, and the others negative), the matrix exponential At can be efficiently obtained from Q by spectral decomposition (Moler and Van Loan, 1978Go) as The Ai values are the spectral matrices obtained from the eigenvectors of Q, and the {lambda}i values are the eigenvalues of Q.

Conductance model
To describe the conductance properties of an ensemble of NC channels, we make the following simplifying assumptions: i), the baseline current (i.e., the current when all channels are closed) is a {delta}-correlated Gaussian random variable, with mean µ0 and variance V0; ii), the current passing through a single channel in state i is a {delta}-correlated random Gaussian variable, with mean µ0 + µi and variance V0 + Vi (Sigworth, 1985Go). By definition, µi = 0 and Vi = 0 for all the closed states. For two channels, one in state i and one in state j, the current is a Gaussian with mean µ0 + µi + µj and variance V0 + Vi + Vj. In general, the current given by an ensemble of NC channels is the sum of Gaussian random variables, which is another Gaussian random variable. Hence, the conditional probability distribution of It, for a given probability vector Pt, has the following expression:

(5)

(6,7)

is the number of channels occupying state i at time t, and can be formulated as a function of state probabilities: where is the occupancy probability of state i at time t. Let µ and V be vectors of dimension NS, with elements µi and Vi as defined. Equations 6 and 7 can now be written in a more compact, vectorial form, as follows:

(8,9)

Equations 8 and 9 give, respectively, the conditional mean and variance of the current It recorded from an ensemble of NC channels, given a state probability vector Pt. To obtain the unconditional distribution of It, we must remember that Pt, as calculated with Eq. 4, is the mean of a probability distribution, and that state occupancies will have fluctuations around this mean. Hence, the recorded current has two sources of variance: i), baseline and intrinsic state noise, and ii), state fluctuations. The state fluctuations theoretically have a multinomial distribution (the extension of the binomial to NS states), but we will approximate it with a multivariate Gaussian to avoid a complicated convolution of probability distributions. This approximation will introduce a bias for currents arising from small numbers of channels. The covariance matrix of state fluctuations for the ensemble of NC channels, at time t, is defined as:

(10)
where is the NS x NS state covariance matrix, with elements and We obtain the unconditional distribution of It as follows:

(11)

(12,13)

The vector product represents the linear transformation of the state covariance matrix from the NS x NS state space into the unidimensional measurement space. To speed the computation, traces recorded in response to the same stimulation protocol can be averaged (assuming stationary kinetics), but to properly exploit the information contained in the sweep-to-sweep fluctuations, the expected variance Vt needs to be divided by the number of averaged traces. The above equations are derived for an arbitrary number of conductance classes. Hence, at least in principle, the method is suitable for models with substates (partially conducting states).

Starting state probabilities
The initial probability vector P0 can be specified in two ways. One is to let the user assign values from a priori assumptions. This is commonly done with ligand-gated channels, when zero ligand concentration drives channels into an unliganded state, which is assigned a unity starting probability. With voltage gated channels, a conditioning potential is assumed to populate a specific state. However, this approach is not suitable for models with aggregated initial states (e.g., multiple unliganded states). The solution is to calculate the initial state probabilities as the equilibrium probabilities determined by the initial (conditioning) experimental conditions, which are assumed to last long enough for channels to reach equilibrium. The equilibrium assumption requires that P0 satisfies the condition where is the rate matrix for which the equilibrium conditions hold. One way to calculate P0 is the following (Colquhoun and Hawkes, 1995bGo):

(14)
where R is a NS x (NS + 1) matrix, formed by adding a column of ones to the right-hand side of and u is a column vector of dimension NS, with each element ui = 1.

Likelihood function
We describe here the likelihood function that is to be maximized by the algorithm. The likelihood L is defined as the conditional probability of the macroscopic current trace I, recorded for time t = [0...T], given the model parameters {theta}:

(15)

We assume that the state probability vector Pt is a function of the initial state probabilities P0, but does not depend on the state probabilities at any other time. Hence, the distribution of the current at any time t is a function of the initial state probabilities only, which is equivalent to saying that the data have no autocorrelation beyond that contained in the A matrix. Thus, the conditional probability of the entire current trace is equal to the product of the individual conditional probabilities:

(16)
where with the mean µt and the variance Vt as defined in Eqs. 12 and 13. The objective is to find the parameter set {theta}ML that maximizes the likelihood L:

(17)

The likelihood may be a very small, or a very large number, because it involves Gaussians, so to reduce numerical errors we actually maximize the logarithm of the likelihood function:

(18)

The parameters to be estimated are and (Eq. 2), and the channel count NC (in the following, for the sake of clarity, we omit the derivations for conductance parameters). There is no mathematical constraint on the exponential factor but the preexponential factor and the channel count NC must be positive numbers, leading to the following useful reparameterization:

(19,20)

The exponential formulation of the rates provides an implicit constraint that the rate constants are all positive, and hence physically realizable.

Parameter constraints
In addition to positivity of the preexponential factors, models may require other constraints. For example, if the two binding sites of the AChR are assumed to be identical and noninteracting, then the rates of the two consecutive binding steps should be constrained in a 2:1 ratio. Without such a constraint, the rates would be different; on the other hand, it is possible to test for independent sites using the likelihood. A priori constraints simplify otherwise complicated models. Each constraint, in addition to incorporating previously known results, reduces the number of free parameters by one. Typical constraints are: i), fix the pre- or exponential factor; ii), scale one pre- or exponential factor proportional to another; iii), enforce microscopic reversibility of cycles (the product of the rate constants going in one direction of a cyclic reaction is equal to the product of the rate constants going the other way). These constraints can be formulated as follows, where ct denotes the constrained value:

(21)

(22)

(23)

(24)

(25)

The constraint expressions above can be written in matrix form as a set of linear equations (Qin et al., 1996Go, 2000Go):

(26)
where r is the vector of dependent (constrained) variables, with entries for and u. Mc is the coefficient matrix and Vc is the constant vector. The dependent variables r can in turn be expressed as a function of the independent variables x (Qin et al., 2000Go):

(27)
where x is the vector of free parameters. The matrix Ac and the vector Bc can be determined from Mc and Vc using, e.g., the singular value decomposition (Qin et al., 2000Go). Other, nonlinear constraints could be included as well, by adding penalty terms to the likelihood function.

Computation of the likelihood function and its derivatives
We describe next the details of the computation of the likelihood function and its derivatives. The log likelihood LL of a macroscopic current, recorded for time t = [0...T] under constant experimental conditions, is calculated as follows:

(28)

(29)

The log-likelihood function in Eq. 29 reduces to the equally weighted (i.e., nonweighted) sum of squares if the constant terms and the variance-containing terms are eliminated.

The quantities of interest are the pre- and exponential factors and and the channel count NC, but the optimizer maximizes LL with respect to the free parameters x. and NC are subsequently obtained from x using Eqs. 19, 20, and 27. The derivative of LL (or SS) with respect to a free parameter xk can be calculated with the chain rule as follows:

(30)
with and where i and j are indices over the off-diagonal entries in the Q matrix corresponding to nonzero rates in the kinetic model. The partial derivatives of and u with respect to xk are obtained with Eq. 27. Let z denote either qij or NC. Then the partial derivative of LL with respect to z is calculated as follows:

(31)

(32)

Similarly, The derivatives of µt and Vt with respect to z are:

(33)

(34)

The following results can be immediately derived: The derivative of the matrix with respect to qij (and similarly for ) is another matrix with elements:

(35)

The derivatives of the state probability vector Pt with respect to qij and are:

(36)

The following results can be immediately derived: The derivative of the initial state probability vector P0 can be obtained using the following equality:

(37)

We skip the derivation details and give the final expression:

(38)

The derivative of the At matrix (Ball and Sansom, 1989Go; Najfeld and Havel, 1995Go) is:

(39)

(40)

The derivative of Q (and R) with respect to qij (and ) is another matrix Qij', with the properties

Extension to arbitrary stimulation protocols
All the above calculations assume that the experimental conditions (i.e., the stimuli) do not change across the data set. We show next how to extend the method to arbitrary stimulation protocols. We assume that the applied stimulus is approximated by a sequence of SC steps of constant value and arbitrary duration. Each step s starts and ends at times and respectively. We rewrite the expressions of the log likelihood and its derivative, to show the dependence of µt and Vt on the step s, as follows:

(41)

(42)

In this case, the log likelihood is a function of multiple Q matrices (in addition to Qeq), one for each set of experimental conditions that change the ion channel response. Although for each value of the stimulus there is a different Q matrix, the set of free parameters remains the same, as it does not depend on experimental conditions. Accordingly, the derivative of the log-likelihood function LL with respect to a free parameter xk has the following form:

(43)
where c is an index over all different experimental conditions, including the initial equilibrium. Let us consider the simpler case when the stimulus changes the kinetics, but not the unitary channel current, e.g., a change of ligand concentration. The expressions of µt,s and Vt,s become:

(44,45)

For an efficient computation of the state probability vector Pt,s and its derivatives, we derive recursive relations. The state vector at time t, contained in the step s, is calculated as follows:

(46)
where By definition, the vector at the start of step s is equal to the vector at the end of step s – 1:

(47)
where Equation 47 can be written recursively and initiated as follows:

(48-50)

The derivative of the state vector can also be calculated recursively and initiated as follows:

(51)

(52,53)

The derivative of At,s with respect to is equal to zero when the experimental conditions for step s have an index different than c, thus simplifying the calculations. A similar treatment can be derived for voltage stimuli that change both the kinetics and the unitary current.

Variance of the estimates
Our choice of optimization routine is the Davidon-Fletcher-Powell algorithm (Fletcher and Powell, 1963Go; Fletcher, 1981Go), which is a quasi-Newton method with approximate line search, with efficient quadratic convergence. We used the "dfpmin" implementation in Press et al. (1992)Go, with some modifications (Qin et al., 2000Go). The optimizer provides variance estimates for the optimized, free parameters x. The variance of the pre- and exponential factors and and of the channel count NC can be derived as follows (Qin et al., 2000Go):

(54)

(55)

(56)


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND ALGORITHM
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Electrophysiology
The adult mouse muscle-type nicotinic AChRs ({alpha}-, ß-, {delta}-, and {varepsilon}-subunits) were transiently expressed in tsA 201 cells using a calcium phosphate-based transfection technique, as described (Akk, 2002Go). The cells were plated on coverslips in 35-mm dishes. At the time of experiments, one coverslip at a time was transferred into the recording chamber. The electrophysiological recordings were carried out using a piezo-driven liquid filament system to measure the macroscopic responses to 1 mM carbachol on outside-out patches (Heckmann and Pawlu, 2002Go). The system had a solution exchange time of <50 µs when measured with changes in the liquid junction potential with an open patch electrode, or <100 µs when measured with an electrode containing a patch. The bath solution contained (in mM): 140 NaCl, 5 KCl, 2 CaCl2, 1 MgCl2, 10 glucose, and 10 Hepes (pH 7.3). The pipette solution contained (in mM): 140 KCl, 5 MgCl2, 5 EGTA, 10 glucose, and 10 Hepes (pH 7.3). The agonist was dissolved in the bath saline solution and the membrane potential was held at –40 mV. The experiments were carried out at room temperature. Currents were amplified with an Axopatch 200B amplifier (Axon Instruments, Union City, CA), low-pass filtered at 10 kHz, and digitized at 50 kHz. Traces of 30-ms duration were recorded at intervals of 1 s using the ISO2 software (MFK, Taunusstein, Germany). Averaged currents from 10 to 25 traces were used for kinetic analysis.

Simulation and data analysis
Stochastic and deterministic simulations were generated with the QuB software (www.qub.buffalo.edu), using the ligand-gated kinetic models from Fig. 1, referred to as C-O-C and C-C-O, respectively, and the three stimulation protocols from Fig. 2, referred to as P1, P2, and P3, respectively. We call the parameter values used for simulation true parameters, as opposed to estimated parameters. For each simulation, 10,000 traces were generated. Simulations with other models and protocols are described in the Results. A sampling interval of 10 µs was used for simulations, except in the analysis of experimental data, for which we used 20 µs. The QuB program can generate a stochastic macroscopic simulation as the summation of NC single-channel stochastic simulations, in conformity with Eqs. 6 and 7. Alternatively, it can make a deterministic macroscopic simulation by calculating the current according to Eq. 8. The starting state probabilities are calculated as the equilibrium probabilities under the conditioning stimuli. For stochastic data, the starting state for each of the NC channels is chosen randomly, such that the probability of starting in state i is proportional to The estimation of rate constants and channel count, and the nonweighted exponential fitting were done with QuB.



View larger version (21K):
[in this window]
[in a new window]
 
FIGURE 1  The models used for simulation. (A) The transitions C1->O2 (in model COC) and C1->C2 (in model CCO) are ligand-binding steps. (B) An example of single-channel data simulated with the COC model. The single-channel current in either the Closed or the Open state is a random Gaussian variable.

 


View larger version (19K):
[in this window]
[in a new window]
 
FIGURE 2  The macroscopic response to different stimulation protocols. Macroscopic traces were simulated deterministically (smooth red lines) and stochastically (noisy traces) with the CCO and COC models, with NC = 1000. The starting equilibrium probabilities were (1, 0, 0), for protocol P1, and (0.571, 0.143, 0.286), for protocols P2 and P3. The units of the ligand concentration are arbitrary.

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND ALGORITHM
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Model identifiability
What is a "correct" kinetic model? One first has to answer two identifiability questions: i), for a given stimulation protocol, can two models of the same size but different topology be distinguished? And ii), for a model of given topology and size, and a given stimulation protocol, does the mathematical model have a unique solution with respect to the rate constants and the number of channels, i.e., are there multiple optima?

With respect to the first question, Kienker (1989)Go showed that two models of different topology cannot be distinguished under equilibrium conditions if their Q matrices are related by a similarity transform. For example, the C-C-O and C-O-C models from Fig. 1 can describe the same equilibrium data with identical likelihood. If some three-state models cannot be distinguished, then we should be wary of more complicated models. However, the use of driving stimuli greatly enhances the ability to differentiate among models. For instance, when started in the leftmost state, the C-C-O model has a second-order latency, whereas the C-O-C has a first-order latency.

Can these two models be distinguished from macroscopic nonstationary data? The answer is yes. We fit the deterministic simulation of the C-O-C model in response to the P1 stimulation protocol (Fig. 2 A), with the C-C-O model, and vice versa. The results are not shown, but we could not find any set of parameters that could generate the same current as the other model. The two models are therefore mathematically distinguishable with the simplest possible protocol, but how well can they be differentiated from noisy (possibly nonideal) data is a different issue.

With respect to the second question, a model can be classified (Cobelli and DiStefano, 1980Go) as a priori: i), "uniquely identifiable" if all its parameters have a unique solution; ii), "nonuniquely identifiable" if at least one of its parameters has a multiple but finite number of solutions, and all other parameters have unique solutions; iii) "nonidentifiable" if at least one of its parameters is undetermined, i.e., it has an infinite number of solutions.

We examined the identifiability of the C-C-O and C-O-C models. Macroscopic currents were deterministically simulated in response to the three driving stimuli from Fig. 2, with the rate constants shown in Fig. 1, and the number of channels NC = 1000. We searched for other values of k12, k21, k23, k32, and NC that gave the same average current as the true parameters (i.e., other solutions) by fixing one or more parameters and having the algorithm find the others, restarting several times with different initial values. Because the simulated data were noiseless, we minimized the sum of squares. Only the solutions with zero SS (within numerical precision) were accepted.

The results obtained with model C-C-O and protocol P1 are shown in Fig. 3 A.1. The figure shows that, for a given NC, for each value of k21 there were two solutions for the other three rates. Hence, with protocol P1, only two parameters can be uniquely estimated from model C-C-O. When repeated with different values of NC, the experiment yielded similar results. Interestingly, the admissible domain of k21 had an upper bound (~341 s–1). With protocol P2, NC was varied, and the four rate constants were again estimated (Fig. 3 A.2). For each value of NC, there were again two solutions for the four rates. Hence, protocol P2 is more informative, yielding three free parameters. Not surprisingly, the admissible domain of NC had a lower bound (~740), determined by the peak current amplitude and the unitary current.



View larger version (17K):
[in this window]
[in a new window]
 
FIGURE 3  The model identifiability depends on model topology and stimulation protocol. Macroscopic currents of identical mean (but not variance) can be obtained with different sets of the five parameters (four rate constants and NC). Macroscopic traces were simulated deterministically, as in Fig. 2. The solid and the dashed lines in panels A.1, A.2, and B.1 represent two possible solutions of the estimated parameters, for each value of the varied parameter (k21 and NC). In panel A.1, NC was fixed at its true value of 1000 (varying NC adds an extra dimension to the solution space). In panel B.2, solutions of the four rate constants (solid circles) exist only for two values of NC (1000 and 3333.333). With protocol P1, a maximum of two and three parameters could be uniquely estimated for the CCO (A.1) and the COC (B.1) models, respectively. Protocol P2 increased by one the maximum number of uniquely identifiable parameters (A.2 and B.2), whereas with protocol P3 all five parameters were uniquely identifiable (results not shown).

 
The results obtained with model C-O-C and protocol P1 are shown in Fig. 3 B.1. The rate constant k21 was varied, and NC and the other three rates were estimated. For each value of k21, there were two solutions for the other four parameters. Therefore, with protocol P1 three free parameters can be uniquely estimated from model C-O-C. As with the C-C-O model, the admissible domain of k21 had an upper bound. With protocol P2, NC was varied, and the four rates were again estimated (Fig. 3 B.2). Only two values of NC generated a zero SS: 1000, which is also the true solution, and 3333.333. Therefore, with protocol P2 four free parameters can be uniquely estimated from model C-O-C. It can be argued that the second solution can be dismissed on the basis that NC should be an integer number. However, this is just a particular case. For example, if the true value of NC were 3000, then the second solution would have NC = 10,000.

To summarize, with protocols P1 and P2, two and three free parameters can be uniquely estimated for model C-C-O, respectively, and three and four for model C-O-C. With protocol P3, both models become fully identifiable, with unique solutions for all five parameters (results not shown). In conclusion, model C-C-O is nonidentifiable with protocols P1 and P2, and uniquely identifiable with P3. Model C-O-C is nonidentifiable with P1, nonuniquely identifiable with P2, and uniquely identifiable with P3. Although our search approach was not designed to find all admissible solutions rigorously, it made it clear that even these simple models create identifiability issues, even with nonstationary stimuli. The identifiability can be improved by more complex stimulation protocols. C-C-O is more ambiguous than C-O-C for the same stimulation.

Next, we tested whether including the variance-dependent terms in the likelihood function improves identifiability. Macroscopic currents were stochastically simulated with the C-O-C model, in response to stimulation protocols P1 and P2, respectively. The number of channels NC was fixed at different values and the four rate constants estimated using the nonweighted sum of squares or the log likelihood. The results shown in Fig. 4, A.1 and B.1, essentially restate the conclusion drawn from Fig. 3, i.e., that the parameters of the C-O-C model are not uniquely identifiable with either the P1 or P2 protocols. With P1, the SS corresponding to the best fit is constant for all values of NC (Fig. 4 A.1), or has two identical minima with P2 (Fig. 4 B.1). In contrast, the LL function has a global maximum, centered at the true parameter values, with either protocol P1 (Fig. 4 A.2) or P2 (Fig. 4 B.2). In conclusion, analyzing the response to more complex stimulation protocols and adding the variance-containing terms to the mathematical model are different ways to increase the number of equations, and hence to improve identifiability.



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 4  The model identifiability improves if both the mean and the variance of the macroscopic current are utilized for parameter estimation. Macroscopic currents were stochastically simulated with the COC model, with NC = 1000, in response to protocols P1 and P2 (as in Fig. 2). NC was varied and the four rate constants were estimated from 100 simulated traces, using either the sum of squares or the log-likelihood method. The SS method uses only the mean, whereas the LL method uses both the mean and the variance. For easier comparison between LL and SS, the graphics show the negative SS.

 
Including the variance-containing terms also allowed us to calculate the mean and the standard deviation of the unitary current (Fig. 5). Macroscopic currents were stochastically simulated with model C-O-C in response to protocol P1. The amplitude (Fig. 5 A), or the standard deviation (Fig. 5 B) of the open-state single-channel current was varied, and NC and the four rate constants were estimated, using the LL method. The figure shows that LL has a global maximum centered at the true parameter values.



View larger version (20K):
[in this window]
[in a new window]
 
FIGURE 5  Conductance parameters can be estimated from macroscopic currents with the likelihood function. Macroscopic currents were stochastically simulated with the COC model, with NC = 1000, in response to protocol P1 (as in Fig. 2). The mean µO and the standard deviation {sigma}O of the single-channel Open current were varied and the four rate constants and NC were estimated with the LL method.

 
Statistics of the estimates
In the previous section, we checked the a priori identifiability properties of two simple models, in combination with three stimulation protocols. We found that the two models can be distinguished even with the simplest stimulus (P1 in Fig. 2), and that for each model, the parameters can be uniquely estimated with the SS method and a suitable protocol (e.g., P3), or with the LL method. But can the parameters be estimated with sufficient accuracy and precision from noisy data? We explored extensively the statistical properties of our algorithm using stochastic simulations of the same three-state models, but only present results for the C-O-C model, because C-C-O gave similar results.

We started with a qualitative look at the effects of biological variability. The signal/noise ratio (SNR) of a macroscopic current is proportional to and to as follows from Eqs. 12 and 13:

(57)
where NT is the number of averaged traces. The deviation between the fit and the current calculated using the true parameters is inversely proportional to the SNR (Fig. 6). Probabilistically, at low SNR the fit may differ substantially from the deterministically simulated current, but it matches well both the deterministic and the stochastic simulations at high SNR.



View larger version (31K):
[in this window]
[in a new window]
 
FIGURE 6  The fit of the stochastic simulation may differ substantially from the deterministic simulation, at low signal/noise ratio. Macroscopic currents were simulated stochastically (noisy traces) and deterministically (red lines) with the COC model, with NC = 10, 100, or 1000, in response to protocol P1. The stochastic simulations were fit (green lines) as individual traces or as averages of 10 or 100 traces. The fits are virtually the same for the LL (shown), SS, or exponential fitting.

 
The accuracy and precision of the estimates are functions of SNR (Fig. 7). Parameter estimates were accurate at high SNR, but the accuracy was poor if <10–20 traces are averaged. The SE of the estimates (defined as the ratio between the standard deviation and the average) shows the expected inverse relationship with the SNR. Our maximum likelihood method and the exponential fitting had the same characteristics. However, fitting individual traces separately yielded results that were biased relative to the true parameter values, and were not symmetrically distributed about the mean (Fig. 8), with either the LL method (Fig. 8 A), or with the exponential fitting (Fig. 8 C). Interestingly, the rate constants estimated from equally short stretches of single-channel data had similar non-Gaussian distributions (Fig. 8 B), probably due to the exponential distribution of dwell times.



View larger version (22K):
[in this window]
[in a new window]
 
FIGURE 7  The accuracy and precision of the estimates are functions of the signal/noise ratio. Macroscopic traces were simulated stochastically with the COC model, with NC = 10, 100, or 1000, in response to protocol P1. Rate constants (A), or rate constants and channel count (B) were obtained with the LL method, and time constants were obtained with exponential fitting (C). The average and the SE of the estimated parameters were calculated from 1000 fits of single traces or averages of multiple traces. The horizontal dotted lines in the average graphs mark the true parameter values. For easier comparison, the 10% SE is marked with a line in the standard error graphs.

 


View larger version (28K):
[in this window]
[in a new window]
 
FIGURE 8  The probability distribution of parameter estimates is not Gaussian. Macroscopic traces with NC = 10, and single-channel traces were simulated stochastically with the COC model, in response to protocol P1. Macroscopic traces were fit individually or as averages of 10, and single-channel traces were analyzed individually or in groups of 10. (A) The distribution of rate constants and NC estimated from macroscopic traces with the LL method. Similar results (not shown) were obtained with NC fixed. (B) The distributions of rate constants estimated from single-channel traces, using the QuB software (www.qub.buffalo.edu). (C) The distribution of time constants estimated with exponential fitting. In all graphs, the dotted vertical lines mark the true parameter values.

 
Why is there a bias? One possible explanation is the simplified description of the macroscopic current, as we approximated the multinomial probability distribution of state occupancy with a multivariate Gaussian. This approximation has limits at small numbers of active channels. Fig. 9 A shows the probability distribution of the current, for pOpen = 0.1 and 0.5, respectively. At low pOpen and small NC, the current distribution shows clearly the discrete peaks of the multinomial component (a high pOpen would give identical results). At pOpen close to 0.5, or for larger NC, the distribution was well fit by a Gaussian, as indicated by the overlapped red line in the figure. These results are in agreement with the empirical rule, which states that the binomial is well approximated with a Gaussian, if n*p > 30, where n is the number of draws and p is the probability. Fig. 9 B shows that the standard deviation calculated by the algorithm (which includes one copy of the instrumentation noise and NS copies of the excess state noise, and the gating fluctuations of NS channels), provides a good representation of the data, as indicated by the red line in the figure.



View larger version (22K):
[in this window]
[in a new window]
 
FIGURE 9  The probability distribution of macroscopic data can be well approximated with a Gaussian, except for very low (or very high) pOpen and small NC. Ten-thousand macroscopic traces were simulated stochastically with the COC model, with NC = 10, 100, or 1000, in response to protocol P1. (A) Probability distribution of the macroscopic current, calculated from the 10,000 traces, for pOpen = 0.1 and 0.5. Overlapped is the fit with a Gaussian (red lines). (B) Standard deviation of the macroscopic current, measured from the 10,000 traces. Overlapped is the calculated standard deviation (red lines).

 
Because this approximation does not appear to be the main source of bias, we hypothesized it might be a result of the algorithm ignoring the local time correlation of the data. To test this, we generated data without autocorrelation (Fig. 10). Ten-thousand traces were simulated as in Fig. 7, and the data points at each time were resampled between traces. The resulting traces had the same mean and standard deviation at each time, but lacked the correlation of the original data. The difference between the correlated and the noncorrelated data, and between their respective fit lines is remarkable (Fig. 10 A). The parameter estimates and the mean currents obtained from the noncorrelated traces showed no bias, and higher precision (Fig. 10 B; compare with Fig. 7). The distributions of the estimates were Gaussian, and were statistically centered on the true values (Fig. 10 C; compare with Fig. 8).



View larger version (25K):
[in this window]
[in a new window]
 
FIGURE 10  The bias of the estimates is caused by autocorrelation. Ten-thousand macroscopic traces were simulated stochastically with the COC model, with NC = 10, in response to protocol P1. The traces were further processed, by randomly reshuffling the data across traces, for each data point. The processed traces had the same current probability distribution, but lacked the autocorrelation of the raw data. The rate constants and the channel count were estimated with the LL method, from individual traces or from averages of multiple traces. (A) Examples of a simulated raw trace (top figure) and of a processed trace (bottom figure). Overlapped is the fit (green line), and the deterministic simulation (red line). (B) The average and the SE of the estimates obtained from the processed traces. (C) The distribution of the estimates. The dotted vertical lines mark the true parameter values.

 
Fig. 11 provides an additional view on bias. Macroscopic traces were simulated stochastically and were fit with three different methods: LL, SS, and exponential fitting. The residuals between the data and the fit line, and between the data and the deterministic simulation were calculated. In all three cases, the residuals obtained from fitting were smaller than the expected values, as if the data were fit "too well" (Fig. 11 A). For each data set, the residuals obtained with the best fit parameters were smaller than the residuals obtained with the true parameters (Fig. 11 B). For all three estimation methods, the fit line follows too closely the local time correlations present in the data, and this consistent "overfitting" leads to biased estimates. We speculate that only a Bayesian filtering algorithm (e.g., Roweis and Ghahramani, 2001Go) could obtain unbiased estimates from macroscopic data, but this would entail a high computational cost. The magnitude of the bias is hard to generalize for arbitrary models, but in the models tested, the parameters deviated by <10% (provided at least 10 traces were fit together, and NC ≥ 100), probably much less than the variability introduced by biological heterogeneity and nonideal noise.



View larger version (27K):
[in this window]
[in a new window]
 
FIGURE 11  Three different estimation methods indicate substantial data overfitting. Macroscopic traces were simulated stochastically with the COC model, with NC = 10, in response to protocol P1, and were fit individually or as averages of 10. Rate constants were estimated with the LL and the SS methods, and time constants were estimated with exponential fitting (ExpFit). The sums of squares between the data and the fit line (termed SSLL, SSSS, and SSExpFit, respectively), and between the data and the deterministic simulation (termed SSModel) were calculated. (A.1–A.3) The distribution of the SS for the three fitting methods. Overlapped is the distribution of SSModel (red lines). (B.1–B.3) The correlation between the SS of the three fitting methods and the SSModel. The graphics indicate that, for each fit, the residuals obtained with the best fit parameters are considerably smaller than the residuals obtained with the true parameters. The estimated parameters are therefore biased.

 
We examined the cross-correlation between different pairs of parameter estimates, under different conditions of SNR or stimulation protocol (Fig. 12). The estimates of, e.g., k21 and k32 showed bimodal distributions, with protocol P1 and at low SNR (row A). Correlations were especially strong between k21 and k12, and between k32 and k23. At higher SNR (rows B and C), the width of the distributions was reduced, to the point that with NC = 1000 they became unimodal (row C). The P3 protocol likewise narrowed the distribution (row D). As expected, neither a higher SNR (compare rows B and C with row A), nor a more complex protocol (compare row D with row C) could eliminate the correlations between parameters. The implication for fitting is that if one parameter is estimated with bias or with high variance, then its strongly correlated partners will also be biased and imprecise.



View larger version (31K):
[in this window]
[in a new window]
 
FIGURE 12  The parameter estimates are correlated. Macroscopic traces were simulated stochastically with the COC model, with NC = 10 or 1000, in response to protocols P1 or P3, and were fit individually or as averages of 10. Rate constants were estimated with the LL method. The red solid circles mark the true parameter values. Each graph is a two-dimensional histogram of the estimates for a given pair of two parameters, under a given set of conditions (NC, protocol, and number of averaged traces). There was strong correlation between, e.g., the k32 and k23 rate constants (row C or D), but little, if any, between, e.g., k23 and k12 (row C). Similar correlations were found between rates and NC (not shown). The correlations were not eliminated by a higher signal/noise ratio (compare rows B and C with row A), nor by a more complex stimulation protocol (compare row D with row C). Notice the change in scale between the first two and the last two rows of figures.

 
Model identifiability affects the accuracy and precision of the estimates (Fig. 13). Macroscopic traces were stochastically simulated in response to protocols P1, P2, and P3, and rate constants and channel count, or rate constants, channel count, and unitary current µO were obtained with the LL method. The quality of the estimates was improved by a more elaborate stimulation protocol (compare between A, B, and C panels), and it was negatively affected by raising the number of free parameters (compare the left with the right figures in each panel).



View larger version (18K):
[in this window]
[in a new window]
 
FIGURE 13  The accuracy and precision of the estimates are functions of model identifiability. Macroscopic traces were simulated stochastically with the COC model, with NC = 1000, in response to protocols P1, P2, and P3 (A, B, and C, respectively). Rate constants and NC (left-hand side figures of each panel), or rate constants, NC and the mean of the unitary current µO were estimated with the LL method. The average and the SE of the estimated parameters were calculated as in Fig. 7.

 
Analysis of experimental data
After exploring the statistical properties of our algorithm with simulated data, we determined its performance with experimental data from two patches: the response of AChRs to brief pulses of carbachol. The results obtained with the first patch are presented in Fig. 14. Ten current traces were averaged for analysis (Fig. 14 A). The concentration versus time profile of the experimental carbachol pulse was idealized assuming an instantaneous rise time (Fig. 14 B). The location of the concentration jumps, tstart and tend, were not precisely known, but were estimated as follows: we tried different combinations of the tstart/tend pair, searching around the visually selected values, and chose the points that gave the best fit. We used a simplified version of the standard kinetic model of the acetylcholine receptor (Magleby and Stevens, 1972Go; Colquhoun and Sakmann, 1985Go), with and without a desensitized state: C-C-C-O-D and C-C-C-O (Fig. 14 C). The ligand binding sites were assumed identical and independent, thus reducing the number of free parameters, as indicated in the figure. Due to contamination with 50 Hz periodic noise and artifacts from the piezoelectric solution exchanger, we analyzed the data with the SS method, because the LL method might have overestimated the variance, and thus NC. We assumed a unitary current of –3.4 pA. Fits to C-C-C-O and C-C-C-O-D models are shown as the green and the red lines in Fig. 14 A, for which the rate constants and the number of channels were estimated simultaneously. The blue line is the fit with the C-C-C-O-D model, in which all rate constants were fixed to values previously reported in the single channel literature (termed the "reference" rates; e.g., Akk and Auerbach, 1999Go), and only the channel count was estimated. The rate constants obtained by fitting the C-C-C-O-D model to the data are presented in Fig. 14 D, for comparison with the reference rates. The fits with both the C-C-C-O and the C-C-C-O-D models well matched the data, although the inclusion of a desensitized state provided a better fit in the carbachol wash-in phase. In contrast, the fit using the reference rates clearly deviated from data, during the rise and fall phases.



View larger version (22K):
[in this window]
[in a new window]
 
FIGURE 14  The kinetics of the acetylcholine receptor can be determined from the macroscopic response to brief pulses of 1 mM carbachol. The experimental procedure is described in Materials and Methods. (A) The average of 10 macroscopic traces. The green and the red lines are the fits with the CCCO and CCCOD models, respectively (C). The rate constants and NC were estimated from the average trace, using the SS method. For the single-channel amplitude we used the value found in single-channel experiments. The blue line is the fit with the CCCOD model, in which all rate