Originally published as Biophys J. BioFAST on January 28, 2005.
doi:10.1529/biophysj.104.053256
Biophysical Journal 88:2494-2515 (2005)
© 2005 The Biophysical Society
Maximum Likelihood Estimation of Ion Channel Kinetics from Macroscopic Currents
Lorin S. Milescu *,
Gustav Akk
and
Frederick Sachs *
* Department of Physiology and Biophysics, State University of New York, Buffalo, New York; and
Department of Anesthesiology, Washington University in St. Louis, St. Louis, Missouri
Correspondence: Address reprint requests to Frederick Sachs, E-mail: sachs{at}buffalo.edu.
 |
ABSTRACT
|
|---|
We describe a maximum likelihood method for direct estimation of rate constants from macroscopic ion channel data for kinetic models of arbitrary size and topology. The number of channels in the preparation, and the mean and standard deviation of the unitary current can be estimated, and a priori constraints can be imposed on rate constants. The method allows for arbitrary stimulation protocols, including stimuli with finite rise time, trains of ligand or voltage steps, and global fitting across different experimental conditions. The initial state occupancies can be optimized from the fit kinetics. Utilizing arbitrary stimulation protocols and using the mean and the variance of the current reduce or eliminate problems of model identifiability (Kienker, 1989
). The algorithm is faster than a recent method that uses the full autocovariance matrix (Celentano and Hawkes, 2004
), in part due to the analytical calculation of the likelihood gradients. We tested the method with simulated data and with real macroscopic currents from acetylcholine receptors, elicited in response to brief pulses of carbachol. Given appropriate stimulation protocols, our method chose a reasonable model size and topology.
 |
INTRODUCTION
|
|---|
Ion channels control the flow of ions through cell membranes. These molecular gates exist in a continuum of conformations, but in practice these are reducible to a small number of kinetic states that live long enough to be resolved experimentally. Some of the states do not conduct ions (closed), whereas others are partially or fully conducting (open). "Instantaneous" transitions occur between these conformations, with the probability of transition described by a rate constant. The rate constant contains two intrinsic terms that describe the coupling of the rates to driving forces, such as ligand concentration, transmembrane voltage, tension, etc. (Hille, 2001
).
The objective of kinetic analysis is to create a modela hypothesiswith the minimum number of kinetic states and connections that can describe the data. Successful models are wonderful summaries of the data, and in that simplification help us to understand the structure and function of individual ion channels, and their mean behavior in cells. Hodgkin and Huxley provided the earliest prototype of this kind of analysis (Hodgkin and Huxley, 1952
). The rate constants quantify the free energy difference between conformational states, providing insight into molecular structure and gating mechanisms. At the cellular level, the differential expression of ion channels, each with specific kinetic and conductance properties and modulation inputs, determine the static and dynamic membrane properties, and ultimately determine the network behavior of excitable cells.
Most properties of ion channels are best studied with single channel data (Neher and Sakmann, 1992
), but technical difficulties often restrict recordings to samples containing multiple channels that cannot be resolved into unitary events. The statistical description of single channel data is well developed thanks to the work of many clever scientists (e.g., Ball and Rice, 1992
; Ball and Sansom, 1989
; Colquhoun and Hawkes, 1982
, 1995a
; Colquhoun and Sigworth, 1995
; Horn and Lange, 1983
; Kienker, 1989
), and sophisticated analysis techniques based on Markov models have been published (e.g., Colquhoun et al., 1996
; Qin et al., 1996
, 2000
). The statistical properties of macroscopic currents generated by ensembles of ion channels are well known, but the detailed kinetics are blurred by the overlap of many independent channels (Colquhoun and Hawkes, 1977
). The time dependence (relaxation) of the current It of a model with NS states, elicited by a step change in experimental conditions, is a sum of NS 1 exponentials (unless there are uncommon degenerate solutions), regardless of how states are connected (i.e., regardless of model topology; Colquhoun and Hawkes, 1995b
):
 | (1) |
The equilibrium current Ieq, the amplitudes Ai, and the time constants
i are not independent quantities because they are all functions of the same rate constants. A traditional method of macroscopic data analysis is to fit exponentials to nonstationary currents, with the main objective of estimating the time constants
i. However, the time constants thus obtained are estimated as independent parameters, without imposing the constraints determined by the particular topology of the model, or other constraints, such as loop balance. Hence, the topology of the model remains ambiguous. Rate constants cannot be easily calculated from time constants, and, without molecular rate constants, clues to molecular structure are obscure. Despite these shortcomings, exponential fitting remains the most widely used analytical method; partly for its intuitive feel, partly because of the illusion of model independence, and partly because few procedures for direct estimation of rate constants are available (e.g., d'Alcantara et al., 2002
; Celentano and Hawkes, 2004
).
A fundamental problem of kinetic modeling is the identifiability of the model, i.e., are there multiple topologies that are equally likely? In a now-classic example (Kienker, 1989
), the model C-C-O (Closed-Closed-Open) cannot be distinguished from the model C-O-C, under equilibrium conditions. However, if one or more of the rates is stimulus dependent, they can be distinguished with nonstationary stimuli. Variable stimuli provide higher degrees of model discrimination.
We present here a maximum likelihood method for direct estimation of molecular rate constants from macroscopic ion channel data, using hidden Markov models. The algorithm can estimate the number of channels in the preparation, and the mean and standard deviation of a binary unitary current. The principle is to formulate the nonstationary macroscopic current It as a random variable with a Gaussian probability distribution, and to maximize the likelihood of the data with respect to the estimated parameters. For a given model, the mean µt and variance Vt of the macroscopic current It are functions of the rate constants, channel count, unitary conductance, and experimental protocol. These functions can be complicated, and involve calculation of the exponential of the rate matrix Q, as shown in Theory. The method is "direct" because the a priori assumptions about the topological and other constraints are explicitly specified in the model. We make two key assumptions: i), the ion channels are identical and independent; and ii), the experimental variables (such as ligand concentration, voltage, etc.) are not functions of channel activity. However, environmental variables need not be constant. We approximated continuously changing stimuli with a series of step functions. Although voltage clamp experiments generally satisfy these requirements, concentration and pressure jumps with significant rise time can be well approximated.
The algorithm is quite general, and has a number of features to assist users in model building: i), It works with models of arbitrary size and topology because the structural equations are internally formulated in terms of matrices and vectors. Models can be quantitatively compared, because distinct topologies with different constraints typically have different likelihoods. The modulation of individual state transitions by driving forces can be naturally studied. ii), Constraints can be imposed on rate constants. These constraints may be a priori knowledge of rate values, imposition of identical ligand-binding sites, microscopic reversibility, etc. The constraints reduce the number of free parameters, permitting exploration of larger allosteric models. iii), The current is modeled as a random variable, i.e., both the mean and the variance are used, which tends to reduce the problem of identifiability, and permits a simultaneous estimate of the mean single channel conductance. iv), The algorithm can estimate the number of active channels in the patch, if the single channel amplitude is known or coestimated. v), It works with nonstationary currents elicited in response to arbitrary experimental protocols, allowing for stimuli with finite rise times. A single model with driven rates can be globally fit across multiple data sets, with arbitrary stimulation protocols, and data from the same or from different patches can be combined in a single analysis. vi), The algorithm can analyze a sequence of stimulus steps (e.g., voltage) without assuming that channels reach equilibrium between steps. This reduces the amount of data required to solve the kinetics, and improves identifiability. vii), The starting probabilities (the state occupancies during a conditioning stimulus) are calculated from the fit kinetics and the experimental conditions. Hence, channels need not be assumed to occupy an arbitrary starting state. viii), Finally, the method efficiently calculates the likelihood function. The algorithm is faster than curve fitting using traditional differential equation solvers (d'Alcantara et al., 2002
). Because the program utilizes only the diagonal elements of the autocovariance matrix, it is much faster then a recent method that uses the full autocovariance matrix (Celentano and Hawkes, 2004
), and hence can handle larger data sets (although with slightly compromised precision). The gradients of the likelihood function are calculated analytically, enabling use of a fast variable metric optimizer (Fletcher and Powell, 1963
; Fletcher, 1981
; Press et al., 1992
), with quadratic convergence.
This article focuses on the algorithm, its basic properties, and the results of extensive testing on simulated data. We concentrated on testing two ligand-dependent models (C-C-O and C-O-C), as the minimum nontrivial kinetic schemes that capture the behavior of more realistic models. As pointed out above, these simple models have identifiability issues: they are not distinguishable with equilibrium data (Kienker, 1989
). However, they can be resolved by applying a suitable stimulation protocol.
We evaluated the two most important properties of the algorithm: the accuracy and the precision, and studied the effect of experimental factors, such as the number of channels in the patch, the amount of data, the stimulation protocol, etc. Because the kinetics of ion channels are rarely as simple as our three-state models, and the noise models of experimental data may depart significantly from the Gaussian assumption, we tested the algorithm with actual macroscopic currents from cells expressing acetylcholine receptors (AChRs). The AChR has a relatively complex kinetics, but the equilibrium rates are known in detail from single channel analysis (e.g., Akk and Auerbach, 1999
). We showed that our method can choose the "correct" model size and topology based on differences in the goodness of fit. Because some of the time constants of AChR kinetics are quite short relative to the rise time of concentration jumps practically obtained, we explored the effect of finite rise time. In future work, we will apply the method to voltage-gated ion channels. The algorithm was implemented in software and integrated in the QuB electrophysiology suite, available for free at www.qub.buffalo.edu.
 |
THEORY AND ALGORITHM
|
|---|
Kinetic model
We review here some basic results of ion channel kinetics, for an arbitrary Markov model with NS conformational states. Rate constants are conventionally expressed as an NS x NS rate matrix Q (also termed the "generator" matrix; Colquhoun and Hawkes, 1995b
). The Q matrix has the properties that each off-diagonal element qij is equal to the rate constant of the transition between states i
j, and that each diagonal element qii is equal to the negative sum of the off-diagonal elements of row i, so that the sum of each row is zero. When there is no direct transition between states i
j, qij = 0. In Eyring-type barrier models, the rate constant qij includes a preexponential factor
multiplied by the concentration Cij of some ligand, and an exponential factor
multiplied by the magnitude of the driving force Vij (such as voltage, pressure, etc.; Hille, 2001
):
 | (2) |
The expression in Eq. 2 is equivalent to the common formulation
where
At any given time, a state probability vector P, of dimension NS, represents the occupancy of the NS conformational states. The Kolmogorov differential equation describes the dynamics of a Markov process (Elliott et al., 1995
):
 | (3) |
The superscript T denotes vector transposition. If P0 is the state probability vector at time t0 < t, then Pt is given by the Chapman-Kolmogorov solution:
 | (4) |
where
is an NS x NS matrix, known as the transition probability matrix. Each element aij is equal to the conditional probability that the channel is in state j at time t, given that it was in state i at time 0. If the eigenvalues of the Q matrix are all distinct (and most ion channel models have distinct eigenvalues, with one being zero and representing the equilibrium solution, and the others negative), the matrix exponential At can be efficiently obtained from Q by spectral decomposition (Moler and Van Loan, 1978
) as
The Ai values are the spectral matrices obtained from the eigenvectors of Q, and the
i values are the eigenvalues of Q.
Conductance model
To describe the conductance properties of an ensemble of NC channels, we make the following simplifying assumptions: i), the baseline current (i.e., the current when all channels are closed) is a
-correlated Gaussian random variable, with mean µ0 and variance V0; ii), the current passing through a single channel in state i is a
-correlated random Gaussian variable, with mean µ0 + µi and variance V0 + Vi (Sigworth, 1985
). By definition, µi = 0 and Vi = 0 for all the closed states. For two channels, one in state i and one in state j, the current is a Gaussian with mean µ0 + µi + µj and variance V0 + Vi + Vj. In general, the current given by an ensemble of NC channels is the sum of Gaussian random variables, which is another Gaussian random variable. Hence, the conditional probability distribution of It, for a given probability vector Pt, has the following expression:
 | (5) |
 | (6,7) |
is the number of channels occupying state i at time t, and can be formulated as a function of state probabilities:
where
is the occupancy probability of state i at time t. Let µ and V be vectors of dimension NS, with elements µi and Vi as defined. Equations 6 and 7 can now be written in a more compact, vectorial form, as follows:
 | (8,9) |
Equations 8 and 9 give, respectively, the conditional mean and variance of the current It recorded from an ensemble of NC channels, given a state probability vector Pt. To obtain the unconditional distribution of It, we must remember that Pt, as calculated with Eq. 4, is the mean of a probability distribution, and that state occupancies will have fluctuations around this mean. Hence, the recorded current has two sources of variance: i), baseline and intrinsic state noise, and ii), state fluctuations. The state fluctuations theoretically have a multinomial distribution (the extension of the binomial to NS states), but we will approximate it with a multivariate Gaussian to avoid a complicated convolution of probability distributions. This approximation will introduce a bias for currents arising from small numbers of channels. The covariance matrix
of state fluctuations for the ensemble of NC channels, at time t, is defined as:
 | (10) |
where
is the NS x NS state covariance matrix, with elements
and
We obtain the unconditional distribution of It as follows:
 | (11) |
 | (12,13) |
The vector product
represents the linear transformation of the
state covariance matrix from the NS x NS state space into the unidimensional measurement space. To speed the computation, traces recorded in response to the same stimulation protocol can be averaged (assuming stationary kinetics), but to properly exploit the information contained in the sweep-to-sweep fluctuations, the expected variance Vt needs to be divided by the number of averaged traces. The above equations are derived for an arbitrary number of conductance classes. Hence, at least in principle, the method is suitable for models with substates (partially conducting states).
Starting state probabilities
The initial probability vector P0 can be specified in two ways. One is to let the user assign values from a priori assumptions. This is commonly done with ligand-gated channels, when zero ligand concentration drives channels into an unliganded state, which is assigned a unity starting probability. With voltage gated channels, a conditioning potential is assumed to populate a specific state. However, this approach is not suitable for models with aggregated initial states (e.g., multiple unliganded states). The solution is to calculate the initial state probabilities as the equilibrium probabilities determined by the initial (conditioning) experimental conditions, which are assumed to last long enough for channels to reach equilibrium. The equilibrium assumption requires that P0 satisfies the condition
where
is the rate matrix for which the equilibrium conditions hold. One way to calculate P0 is the following (Colquhoun and Hawkes, 1995b
):
 | (14) |
where R is a NS x (NS + 1) matrix, formed by adding a column of ones to the right-hand side of
and u is a column vector of dimension NS, with each element ui = 1.
Likelihood function
We describe here the likelihood function that is to be maximized by the algorithm. The likelihood L is defined as the conditional probability of the macroscopic current trace I, recorded for time t = [0...T], given the model parameters
:
 | (15) |
We assume that the state probability vector Pt is a function of the initial state probabilities P0, but does not depend on the state probabilities at any other time. Hence, the distribution of the current at any time t is a function of the initial state probabilities only, which is equivalent to saying that the data have no autocorrelation beyond that contained in the A matrix. Thus, the conditional probability of the entire current trace is equal to the product of the individual conditional probabilities:
 | (16) |
where
with the mean µt and the variance Vt as defined in Eqs. 12 and 13. The objective is to find the parameter set
ML that maximizes the likelihood L:
 | (17) |
The likelihood may be a very small, or a very large number, because it involves Gaussians, so to reduce numerical errors we actually maximize the logarithm of the likelihood function:
 | (18) |
The parameters to be estimated are
and
(Eq. 2), and the channel count NC (in the following, for the sake of clarity, we omit the derivations for conductance parameters). There is no mathematical constraint on the exponential factor
but the preexponential factor
and the channel count NC must be positive numbers, leading to the following useful reparameterization:
 | (19,20) |
The exponential formulation of the rates provides an implicit constraint that the rate constants are all positive, and hence physically realizable.
Parameter constraints
In addition to positivity of the preexponential factors, models may require other constraints. For example, if the two binding sites of the AChR are assumed to be identical and noninteracting, then the rates of the two consecutive binding steps should be constrained in a 2:1 ratio. Without such a constraint, the rates would be different; on the other hand, it is possible to test for independent sites using the likelihood. A priori constraints simplify otherwise complicated models. Each constraint, in addition to incorporating previously known results, reduces the number of free parameters by one. Typical constraints are: i), fix the pre- or exponential factor; ii), scale one pre- or exponential factor proportional to another; iii), enforce microscopic reversibility of cycles (the product of the rate constants going in one direction of a cyclic reaction is equal to the product of the rate constants going the other way). These constraints can be formulated as follows, where ct denotes the constrained value:
 | (21) |
 | (22) |
 | (23) |
 | (24) |
 | (25) |
The constraint expressions above can be written in matrix form as a set of linear equations (Qin et al., 1996
, 2000
):
 | (26) |
where r is the vector of dependent (constrained) variables, with entries for
and u. Mc is the coefficient matrix and Vc is the constant vector. The dependent variables r can in turn be expressed as a function of the independent variables x (Qin et al., 2000
):
 | (27) |
where x is the vector of free parameters. The matrix Ac and the vector Bc can be determined from Mc and Vc using, e.g., the singular value decomposition (Qin et al., 2000
). Other, nonlinear constraints could be included as well, by adding penalty terms to the likelihood function.
Computation of the likelihood function and its derivatives
We describe next the details of the computation of the likelihood function and its derivatives. The log likelihood LL of a macroscopic current, recorded for time t = [0...T] under constant experimental conditions, is calculated as follows:
 | (28) |
 | (29) |
The log-likelihood function in Eq. 29 reduces to the equally weighted (i.e., nonweighted) sum of squares
if the constant terms and the variance-containing terms are eliminated.
The quantities of interest are the pre- and exponential factors
and
and the channel count NC, but the optimizer maximizes LL with respect to the free parameters x.
and NC are subsequently obtained from x using Eqs. 19, 20, and 27. The derivative of LL (or SS) with respect to a free parameter xk can be calculated with the chain rule as follows:
 | (30) |
with
and
where i and j are indices over the off-diagonal entries in the Q matrix corresponding to nonzero rates in the kinetic model. The partial derivatives of
and u with respect to xk are obtained with Eq. 27. Let z denote either qij or NC. Then the partial derivative of LL with respect to z is calculated as follows:
 | (31) |
 | (32) |
Similarly,
The derivatives of µt and Vt with respect to z are:
 | (33) |
 | (34) |
The following results can be immediately derived:
The derivative of the
matrix with respect to qij (and similarly for
) is another matrix with elements:
 | (35) |
The derivatives of the state probability vector Pt with respect to qij and
are:
 | (36) |
The following results can be immediately derived:
The derivative of the initial state probability vector P0 can be obtained using the following equality:
 | (37) |
We skip the derivation details and give the final expression:
 | (38) |
The derivative of the At matrix (Ball and Sansom, 1989
; Najfeld and Havel, 1995
) is:
 | (39) |
 | (40) |
The derivative of Q (and R) with respect to qij (and
) is another matrix Qij', with the properties
Extension to arbitrary stimulation protocols
All the above calculations assume that the experimental conditions (i.e., the stimuli) do not change across the data set. We show next how to extend the method to arbitrary stimulation protocols. We assume that the applied stimulus is approximated by a sequence of SC steps of constant value and arbitrary duration. Each step s starts and ends at times
and
respectively. We rewrite the expressions of the log likelihood and its derivative, to show the dependence of µt and Vt on the step s, as follows:
 | (41) |
 | (42) |
In this case, the log likelihood is a function of multiple Q matrices (in addition to Qeq), one for each set of experimental conditions that change the ion channel response. Although for each value of the stimulus there is a different Q matrix, the set of free parameters remains the same, as it does not depend on experimental conditions. Accordingly, the derivative of the log-likelihood function LL with respect to a free parameter xk has the following form:
 | (43) |
where c is an index over all different experimental conditions, including the initial equilibrium. Let us consider the simpler case when the stimulus changes the kinetics, but not the unitary channel current, e.g., a change of ligand concentration. The expressions of µt,s and Vt,s become:
 | (44,45) |
For an efficient computation of the state probability vector Pt,s and its derivatives, we derive recursive relations. The state vector at time t, contained in the step s, is calculated as follows:
 | (46) |
where
By definition, the vector at the start of step s is equal to the vector at the end of step s 1:
 | (47) |
where
Equation 47 can be written recursively and initiated as follows:
 | (48-50) |
The derivative of the state vector can also be calculated recursively and initiated as follows:
 | (51) |
 | (52,53) |
The derivative of At,s with respect to
is equal to zero when the experimental conditions for step s have an index different than c, thus simplifying the calculations. A similar treatment can be derived for voltage stimuli that change both the kinetics and the unitary current.
Variance of the estimates
Our choice of optimization routine is the Davidon-Fletcher-Powell algorithm (Fletcher and Powell, 1963
; Fletcher, 1981
), which is a quasi-Newton method with approximate line search, with efficient quadratic convergence. We used the "dfpmin" implementation in Press et al. (1992)
, with some modifications (Qin et al., 2000
). The optimizer provides variance estimates for the optimized, free parameters x. The variance of the pre- and exponential factors
and
and of the channel count NC can be derived as follows (Qin et al., 2000
):
 | (54) |
 | (55) |
 | (56) |
 |
MATERIALS AND METHODS
|
|---|
Electrophysiology
The adult mouse muscle-type nicotinic AChRs (
-, ß-,
-, and
-subunits) were transiently expressed in tsA 201 cells using a calcium phosphate-based transfection technique, as described (Akk, 2002
). The cells were plated on coverslips in 35-mm dishes. At the time of experiments, one coverslip at a time was transferred into the recording chamber. The electrophysiological recordings were carried out using a piezo-driven liquid filament system to measure the macroscopic responses to 1 mM carbachol on outside-out patches (Heckmann and Pawlu, 2002
). The system had a solution exchange time of <50 µs when measured with changes in the liquid junction potential with an open patch electrode, or <100 µs when measured with an electrode containing a patch. The bath solution contained (in mM): 140 NaCl, 5 KCl, 2 CaCl2, 1 MgCl2, 10 glucose, and 10 Hepes (pH 7.3). The pipette solution contained (in mM): 140 KCl, 5 MgCl2, 5 EGTA, 10 glucose, and 10 Hepes (pH 7.3). The agonist was dissolved in the bath saline solution and the membrane potential was held at 40 mV. The experiments were carried out at room temperature. Currents were amplified with an Axopatch 200B amplifier (Axon Instruments, Union City, CA), low-pass filtered at 10 kHz, and digitized at 50 kHz. Traces of 30-ms duration were recorded at intervals of 1 s using the ISO2 software (MFK, Taunusstein, Germany). Averaged currents from 10 to 25 traces were used for kinetic analysis.
Simulation and data analysis
Stochastic and deterministic simulations were generated with the QuB software (www.qub.buffalo.edu), using the ligand-gated kinetic models from Fig. 1, referred to as C-O-C and C-C-O, respectively, and the three stimulation protocols from Fig. 2, referred to as P1, P2, and P3, respectively. We call the parameter values used for simulation true parameters, as opposed to estimated parameters. For each simulation, 10,000 traces were generated. Simulations with other models and protocols are described in the Results. A sampling interval of 10 µs was used for simulations, except in the analysis of experimental data, for which we used 20 µs. The QuB program can generate a stochastic macroscopic simulation as the summation of NC single-channel stochastic simulations, in conformity with Eqs. 6 and 7. Alternatively, it can make a deterministic macroscopic simulation by calculating the current according to Eq. 8. The starting state probabilities are calculated as the equilibrium probabilities under the conditioning stimuli. For stochastic data, the starting state for each of the NC channels is chosen randomly, such that the probability of starting in state i is proportional to
The estimation of rate constants and channel count, and the nonweighted exponential fitting were done with QuB.