| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Departments of Bioengineering and Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, Pennsylvania
Correspondence: Address reprint requests to David J. Graves, 311A Towne Bldg., 220 South 33 St., Philadelphia, PA 19104. Tel.: 215-898-7951; E-mail: graves{at}seas.upenn.edu.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
At the same time, there has been an increasing interest in the thermodynamics and kinetics of microarray hybridization. The former has been invoked in an attempt to understand why some sequences work well, i.e., hybridize strongly, and others function poorly, and the latter to understand whether different hybridization times, concentrations, etc. should provide better experimental data (and to explain anomalies such as those cited above). This article does not consider the molecular interaction factors affecting kinetic constants, but instead examines how different constants and species concentrations affect the observed results, particularly the time-dependent and competitive effects as one or two different solution-phase species bind complementary oligonucleotide or cDNA strands in two different microarray spots. Heterogeneous (solid/soluble as opposed to two soluble species) hybridization kinetics have been examined by a number of people. However, the competitive binding situation has not been examined as carefully, and certain limitations in published approaches have led to results that are not always accurate.
Theory for a simple system
Here, we examine several general cases involving one or two soluble species competing for one or two types of binding sites. We use S to represent a soluble species, I to represent immobilized surface-bound species, and SI to represent the hybridized pair in a microarray spot. Although the terms probe and target are often applied to these species, two opposite conventions defining which is which are currently in use, so we prefer this less ambiguous nomenclature. Subscripts A,B,C, etc. will be used for the different sequences of nucleic acid. As usual, the kinetics are represented by the familiar relationship
![]() | (1) |
Unlike the case when both species are in solution, the situation is more complex when one of the two is immobilized on a surface. The relative amounts of the surface and solution phases must be taken into account. In addition, concentration units on the surface are generally measured in molecules/area (e.g., molecules/cm2), whereas those in solution are in moles/liter. This can be taken into account by dividing the surface phase concentration by v, the volume in liters of solution per square centimeter of surface area and by Avogadro's number, NAv, to convert from molecules to moles. Still more complications can arise if the layer of immobilized molecules does not cover the entire surface, or if the solution is unstirred, so that the effective volume that can equilibrate with DNA-covered surface differs from the total volume to total surface area ratio. This first of these complications can be handled by insuring that only the nucleotide-covered portion of the surface is used (spot area) rather the total surface. For purposes of this analysis, the last complication will be assumed not to exist, although in practice nonhomogeneous solution concentrations can be a significant problem. With these considerations, the differential equation describing adsorptive and desorptive events resulting in hybridization can be written as
![]() | (2) |
![]() | (3) |
1, remains the same as K'd (see below), and the units of kf and kr have their usual units of liters x mol1 x s1 and s1, respectively. These substitutions simplify the resulting equations. However, one must remember this substitution when interpreting their values. The recast Eq. 3 then becomes
![]() | (4) |
![]() | (5) |
![]() | (6) |
is defined below as
![]() | (7) |
![]() | (8) |
1012 molecules x cm2 (although it is often considerably less; Graves, (5
106 mole x liter1 or less, not an unreasonable figure in comparison with what is likely to be present in the liquid phase.
Expected values for the forward, reverse, and dissociation constants can be estimated from literature values (6
9
). These are typically between 104 and 5 x 106
x mole1 s1 for kf, 0.1105 s1 for kr, and 1071011 mole x
1 for Kd. If for convenience we convert these to a micromolar basis, they become 0.015
x µmole1 s1 for kf and 101105 µmole x
1 for Kd (kr unchanged). It has been estimated that when the two strands are mismatched, the relative affinities decrease
10- to 100-fold (i.e., kr and Kd both increase by these amounts) (7
). Liquid phase concentrations also can be given in µmol x
1 and immobilized phase concentrations converted to the same units using the 1/(v NAv) factor. These definitions and substitutions for constants and concentrations can be used in the following sections of this article to give values that generally range from 103 to 103, but we make no attempt to cover the entire range of reasonable values. Furthermore, one should recall that the effects of diffusion have been totally ignored in the simulations that follow, so the timescale in the simulated results, which should be in seconds to be consistent with these other units, shows results that change much too quickly in comparison to real-world data. Livshits and Mirzabekov show that diffusion can dramatically slow the attainment of hybridization equilibrium (10
). It is safest to view all the results that follow as qualitative indications about how simultaneous hybridizations will behave and interfere with one another. For this reason, units have been omitted from the results. Furthermore, many of the important results are related to the state that exists at equilibrium, where absolute rates are not particularly relevant.
Extension of theory for more than one equilibrium state
The exponential approach to an equilibrium value predicted by Eqs. 5 and 6 is not surprising, and a curve representing the time course of this behavior will not be presented here. However, more interesting and surprising results are seen when a soluble species is distributed between two immobilized sequences. For these and more complex situations, simultaneous differential equations must be solved, and the complexity of this situation dictates that a more efficient computerized solution be used. This method is less prone to human error, even if analytical solutions could be found in some cases. We have used Mathematica (Wolfram Research, Champaign, IL) to aid in this process, and now can remove the previous limitation of an excess of soluble over insoluble material. For soluble species Sc interacting with two different immobilized species IA and IB, respectively to create the immobilized complexes ICA and ICB, the pertinent differential equations are
![]() | (9) |
![]() | (10) |
| RESULTS |
|---|
|
|
|---|
0.74 and ICB is 0.23. Mathematica can carry out the calculation to an arbitrary number of decimal places, and typically we used
20. After 10,000 time units, ICA is 0.7448477... and ICB is 0.22595997... giving a ratio CA/CB closer to 3 than to 10, and showing that the apparent equilibrium at a time as short as 50 is indeed close to the correct value. Furthermore, ICB initially overshoots its final equilibrium value before dropping back to this value. This behavior is due primarily to two factors: the equal forward rate constants, and the depletion of material from the pool of soluble C species. That these results are correct is confirmed by the corresponding result that SC = 0.0291923.... Using the definition of Kd1 = SC · IA/ICA or the corresponding ratio for Kd2, these three values of concentration, it can be shown that they give very precisely the required values for Kd1 and Kd2,
![]() | (11) |
![]() | (12) |
3:2.
|
This exercise was repeated with other values of the reverse rate constants and relative quantities of material A, B, and C, and the results are shown in Table 1. Note that this analysis considers only the initial hybridization step and not the washing step, which will be covered later. These calculations were carried out with dissociation constants of 0.01 and 0.001 for species A (the preferred immobilized material), while the constant for species B (the incorrect hybrid) remained fixed at 0.1. Some of the conclusions that can be drawn looking at the results in this table are as follows:
10.
|
Washing of hybridized spots
These results have not yet considered the effect of a washing step after hybridization. This was done by using Mathematica to solve Eqs. 9 and 10 again following transformation in the manner shown in Eq. 4 to consider the initial amounts of immobilized complex. The initial concentration values for complexes CA and CB were taken from the results in Table 1, and the initial amount of C in the washing solution was set at zero. One additional complication for this analysis is that the wash solution volume is generally much larger than that used during hybridization. A dilution term symbolized as "dil" was used in the calculations to dilute the effective concentration of C as it returns to the solution phase. For illustration purposes, this factor was set arbitrarily at 100 to generate the results shown in Table 2. Now, in several cases where the concentration of the incorrect hybrid was comparable to or exceeded the properly matching one (rows 2, 3, 5, and 810), the situation has been at least partially corrected by washing. Only in rows 5 and 10 is the situation still rather poor. In those cases where the relative ratio during hybridization was favorable (rows 1, 6, and 7) the situation is approximately the same or slightly improved relative to the prewashing results.
|
|
|
Competitive binding between two immobilized molecules and two soluble molecules
Although the case of two immobilized and one soluble species is quite revealing, a simulation with two of each species is closer to the actual set of complex competitive processes taking place within real microarray systems. In this situation the relevant set is equations is as follows:
![]() | (13) |
![]() | (14) |
![]() | (15) |
![]() | (16) |
A more interesting case is one in which all eight binding constants are allowed to assume independent values. Of course with eight constants and four initial species concentrations it is much more difficult to study a reasonable subset of possible conditions. One interesting case that was studied assumed that C1 (the first soluble species) was supposed to bind to A and C2 (the second) to B, each pair with equal strength. However, each could also bind the incorrect immobilized partner (C1 to B and C2 to A) more weakly but again with equal strength. This should be a fair representation of real experiments, since immobilized species are generally designed to have approximately equal binding affinities for their complements. Note that the soluble species nomenclature has been modified from C3 and C5 to C1 and C2 to avoid confusion. These species no longer represent two different dyes but simply two different gene products. We have already shown that the dyes are expected to sort according to values of the kinetic constants.
Just as was the case for the earlier simulation, the second type of solution method (concentration optimization to minimize forward and reverse rates) was carried out to verify the apparent equilibrium values obtained at long times. This optimization was very demanding on the algorithm with so many variables, and three of the cases (given by lines 5, 7, and 9 in Table 3) did not converge. Two others started to converge (lines 2 and 11 in Table 3) but did not completely regenerate all the correct equilibrium constants. All other cases converged and gave results indistinguishable from the differential equation solution method. For those that did not converge properly, a third method was used. Equilibrium concentrations predicted by the differential equation solver (our first method) were substituted into all four equilibrium relations and the resulting constants were compared with the values originally supplied as data. Again, in all cases results were indistinguishable from the originals. Since all results have been verified by at least one independent method (several were tested by both methods two and three), we are quite confident of their correctness and accuracy.
|
The situation improves in rows 610, where the binding constant ratio becomes 100 rather than 10. However, the situation is still not good in row 7, where both soluble species are relatively abundant in comparison to the immobilized spot concentration. Row 11 was added to show what would happen in the case of a very rare nucleotide in the presence of a large amount of a common one. Now, even though the binding constant ratio of correct to incorrect is very favorable (100), the large concentration difference completely overcomes this advantage. The C2B/C1B ratio indicates that the proper amounts of C2 and C1 in solution would not be registered by hybridization to their respective immobilized complements.
The last three columns in the table represent the actual concentrations of the correct hybrids C1A and C2B and the error in perceived value due to the additional binding of the incorrect species. In other words, both C1A and C2A would fluoresce, so the perceived signal on spot A would be incorrectly high (likewise on spot B). Note that when the ratios in columns 6 and 7 are well above unity, correct results are seen. However, when they become small, errors can be large. This is especially apparent in the row-11 data, where the C2 product is present at only 0.001 of the amount of C1. Here, the incorrect hybrid completely swamps the signal, giving a value >95 times that of the correct one even though it binds 100 times less strongly than the correct one. This result suggests that the measured concentrations of rare gene products may be much higher than their true values in typical microarray experiments. It would seem that even two-dye labeling would not help resolve the issue, since all mRNAs in a given sample (test or standard) would have the same label. One would expect that the relative abundances of such rare products (test versus standard) would follow the ratio of an incorrect but plentiful product that also hybridizes slightly to the complementary immobilized spot rather than assuming their true values. However, it must be remembered that we have not yet considered the washing step.
Fig. 4 shows two representative sets of results for some of these simulations. Fig. 4 a represents results for the data in row 1 and Fig. 4 b for that in row 10. Although the curves are labeled C1B and C2B, they really represent all four species. Since the kinetic constants chosen were symmetrical, curve C1B also represents C2A (both being the incorrect hybrids) and C2B represents C1A (the correct hybrids). Note that in the first panel, where the soluble species C1 and C2 are initially present in equal amounts, the correct hybrid always exceeds the incorrect one in concentration. However, in the second panel, one can see that because C1 is present in solution at 10 times the concentration of C2, initially the incorrect hybrid C1B forms faster than C2B. Any attempt to measure their relative amounts before
40 time units would give completely erroneous information. As stated above, this has implications for rare transcripts relative to the abundant ones in a mixture.
|
|
| DISCUSSION OF RESULTS AND CONCLUSIONS |
|---|
|
|
|---|
First, the relative abundances of two hybrid pairs that form simultaneously can change dramatically with time, and an initial incorrect hybrid even can be present temporarily at a higher level than the correct one. Therefore, microarray data taken too early in the equilibration process are likely to be in error. Even when equilibrium has been reached, the relative abundances of hybrid complexes are not in the same ratios as one might expect from the relative equilibrium dissociation constants. If both equilibria are favored, the hybrids tend to be more similar in concentration than one might expect. Therefore, the probability that cross-hybridization is significant is also higher than one might expect. Second, the results obtained from a microarray experiment will depend strongly on the conditions used during the washing cycle: how many times the cycle is repeated with fresh solution, how effective mixing is during the washing process, and what volume of wash solution is employed. We have shown that an optimum washing time exists, which, to our knowledge, has not been demonstrated theoretically before. Although good experimentalists undoubtedly have an intuitive feeling that too little washing will fail to remove the incorrect hybrids and too long a wash cycle will remove both incorrect and correct hybrids, leading to weak signals, our technique provides quantitative values for optimum wash times given approximate values for the binding constants and volume of wash solution. Although considerable effort on equilibration and discussion of the effects of different equilibration times is seen in the literature, washing has not received the attention it deserves, nor has its importance been generally recognized.
Third, we show that two-dye experiments are more likely to provide correct answers in microarray experiments than single-dye experiments, particularly where the solution and microspot phases have not come to equilibrium. Here the experimentalists' intuitive feeling about how to improve a microarray analysis has been accurate.
Finally, and perhaps most importantly, cross-hybridization can be an especially significant problem when the incorrect soluble species is much higher in concentration than the correct soluble species. This result, in combination with the earlier stated tendency for hybrids to be similar in concentration even when their equilibrium constants differ significantly, suggests that high expression level mRNAs (or their cDNA representatives) can overpower the low expression level molecules by what amounts to a law-of-mass-action effect. The practical significance of this result is that the way most microarray experiments is currently being carried out may lead to completely erroneous results for some of the rare transcripts. Two articles are cited in which this effect may already have been observed. In these works, the authors fractionated mRNA populations to eliminate some of the high expression level molecules and they saw more differentially expressed genes in the microarray analysis. This result should be of concern to all those who use microarrays to understand cellular function. Unfortunately, since a given sample will have all mRNAs labeled with a particular dye, two-dye experiments are just as likely to suffer from this problem as the simpler single dye experiment.
The Mathematica programs used to obtain these results are available to permit others to study situations not addressed by the cases we have presented here. They are straightforward to use even by those with little familiarity with the language. Other questions and situations in which this type of analysis is useful undoubtedly will arise, and the simulation method and code provided here should prove useful in addressing them. The equilibrium results we present have been verified by use of at least two, and in some cases three, entirely different solution methods. Thus, we have considerable confidence in their accuracy.
| SUPPLEMENTAL MATERIAL |
|---|
|
|
|---|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Submitted on January 5, 2005; accepted for publication August 3, 2005.
| REFERENCES |
|---|
|
|
|---|
2. Yue, H., P. S. Eastman, B. B. Wang, J. Minor, M. H. Doctolero, R. L. Nuttall, R. Stack, J. W. Becker, J. R. Montgomery, M. Vainer, and R. Johnston. 2001. An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res. 29:e41.
3. Dorris, D. R., A. Nguyen, L. Gieser, R. Lockner, A. Lublinsky, M. Patterson, E. Touma, T. J. Sendera, R. Elghanian, and A. Mazumder. 2003. Oligodeoxyribonucleotide probe accessibility on a three-dimensional DNA microarray surface and the effect of hybridization time on the accuracy of expression ratios. BMC Biotechnol. 3:6.[CrossRef][Medline]
4. Lauffenburger, D. A., and J. J. Linderman. 1993. Receptors. Oxford University Press, Oxford, UK.
5. Graves, D. J. 1999. Microarrays: powerful tools for genetic analysis come of age. Trends Biotechnol. 17:127134.[CrossRef][Medline]
6. Persson, B., K. Stenhag, P. Nilsson, A. Larsson, M. Uhlen, and P.-A. Hygren. 1997. Analysis of oligonucleotide probe affinities using surface plasmon resonance: a means for mutational scanning. Anal. Biochem. 246:3444.[CrossRef][Medline]
7. Wang, S. S., A. E. Friedman, and E. T. Kool. 1995. Origins of high sequence selectivity: a stopped-flow kinetics study of DNA/RNA hybridization by duplex- and triplex-forming oligonucleotides. Biochemistry. 34:97749784.[CrossRef][Medline]
8. Stillman, B. A., and J. L. Tonkinson. 2001. Expression microarray hybridization kinetics depend on length of the immobilized DNA but are independent of immobilization substrate. Anal. Biochem. 295:149157.[CrossRef][Medline]
9. Tawa, K., and W. Knoll. 2004. Mismatching base-pair dependence of the kinetics of DNA-DNA hybridization studied by surface plasmon fluorescence spectroscopy. Nucleic Acids Res. 32:23722377.
10. Livshits, M. A., and A. D. Mirzabekov. 1996. Theoretical analysis of the kinetics of DNA hybridization with gel immobilized oligonucleotides. Biophys. J. 71:27952801.
11. Dai, H., M. Meyer, S. Stepaniants, M. Ziman, and R. Stoughton. 2002. Use of hybridization kinetics for differentiating specific from non-specific binding to oligonucleotide microarrays. Nucleic Acids Res. 30:e86.
12. Sakai, K., H. Higuchi, K. Matsubara, and K. Kato. 2000. Microarray hybridization with fractionated cDNA: enhanced identification of differentially expressed genes. Anal. Biochem. 287:3237.[CrossRef][Medline]
13. Rondeau, G., M. McClelland, T. Nguyen, R. Risques, Y. Wang, M. Judex, A. H. Cho, and J. Welsh. 2005. Enhanced microarray performance using low complexity representations of the transcriptome. Nucleic Acids Res. 33:e100e106.
14. Miklos, G. L., and R. Maleszka. 2004. Microarray reality checks in the context of a complex disease. Nat. Biotechnol. 22:615621.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
H. Koltai and C. Weingarten-Baror Specificity of DNA microarray hybridization: characterization, effectors and approaches for data correction Nucleic Acids Res., April 1, 2008; 36(7): 2395 - 2405. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bishop, A. M. Chagovetz, and S. Blair Kinetics of Multiplex Hybridization: Mechanisms and Implications Biophys. J., March 1, 2008; 94(5): 1726 - 1734. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Fish, M. T. Horne, R. P. Searles, G. P. Brewood, and A. S. Benight Multiplex SNP Discrimination Biophys. J., May 15, 2007; 92(10): L89 - L91. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bishop, C. Wilson, A. M. Chagovetz, and S. Blair Competitive Displacement of DNA during Surface Hybridization Biophys. J., January 1, 2007; 92(1): L10 - L12. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Horne, D. J. Fish, and A. S. Benight Statistical Thermodynamics and Kinetics of DNA Multiplex Hybridization Reactions Biophys. J., December 1, 2006; 91(11): 4133 - 4153. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bishop, S. Blair, and A. M. Chagovetz A Competitive Kinetic Model of Nucleic Acid Surface Hybridization in the Presence of Point Mutants Biophys. J., February 1, 2006; 90(3): 831 - 840. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |