help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Originally published as Biophys J. BioFAST on June 8, 2007.
doi:10.1529/biophysj.106.098236
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplement
Right arrow All Versions of this Article:
biophysj.106.098236v1
93/7/2562    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Stibius, K. B.
Right arrow Articles by Sneppen, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stibius, K. B.
Right arrow Articles by Sneppen, K.
Biophysical Journal 93:2562-2566 (2007)
© 2007 The Biophysical Society

Modeling the Two-Hybrid Detector: Experimental Bias on Protein Interaction Networks

Karin B. Stibius * {dagger} and Kim Sneppen *

* Niels Bohr Institute, Copenhagen, Denmark; and {dagger} Risø National Laboratory, Roskilde, Denmark

Correspondence: Address reprint requests to Karin B. Stibius, E-mail: karin_stibius{at}hotmail.com.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
This work was done to investigate the two-hybrid experiment for finding protein-protein interactions to explain the asymmetry found in the experimental data, and to help screen the data for high confidence interactions. By looking at the bait-prey experimental setup the resulting protein interaction network can be examined as a directed network (bait -> prey). We have investigated two possible scenarios for the asymmetry in the directed network by developing a biochemical model for the protein-DNA and protein-protein bindings inside the living yeast. One scenario assumes a background activity of bait proteins acting even without the prey, the other scenario explores the asymmetry in the chemistry associated with the bait being automatically located in the right position on the DNA. We conclude that the latter model gives the best description of the observed asymmetry.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
Protein-protein interactions are central for both signaling and structures inside living cells. These interactions can be studied in different ways. One example is to examine complexes formed around a tagged protein with mass spectrometry (1Go–3Go); another is to use the two-hybrid method (4Go–10Go). In this in vivo method each potential protein interaction is tested by linking one partner to the DNA binding subunit of a transcription factor (bait), and linking the other protein to the subunit that recruits/activates the RNA polymerase (prey). Thereby the activity of the transcribed gene provides information about the strength of the bait-prey interaction. However, as pointed out by the literature (11Go,12Go), there are systematic biases in the bait-prey setup. In this article we discuss some of these biases, and present a frame to validate their respective impact on the obtained protein networks.

Constructing a model for the two-hybrid detector could be of great importance if it is to be used to examine protein-protein interactions on a large scale. A model of the system can help to improve the experimental setup and help to screen the data for the most reliable interactions. The work presented here is meant to give an idea of some features that are of importance for the two-hybrid experiment.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
Experimental data from large-scale two-hybrid experiments (6Go,8Go) was examined and as seen in Fig. 1 there is a systematic difference in connectivity between bait and prey. This difference comes from the bait-prey experimental setup, and the resulting network should thus be examined as a directed network (bait -> prey). Further, the asymmetry of the data can be quantified in terms of a systematic tendency with proteins acting as bait having larger connectivity than the same proteins acting as prey.


Figure 1
View larger version (14K):
[in this window]
[in a new window]

 
FIGURE 1  (A and B) Shown is the total connectivity distribution in Ito's (3Go) two-hybrid data, and the same decomposed into bait and prey parts. Panel A is the full data set and panel B is the high confidence core data.

 
This asymmetry was investigated by two models; see Fig. 2 and the Method section. The left panel, model I, in Fig. 2 assumes a background activity of bait proteins acting even without the prey (random firing), and the right panel, model II, explores the asymmetry in the chemistry associated with bait being automatically located in the right position on the DNA (sequestering).


Figure 2
View larger version (40K):
[in this window]
[in a new window]

 
FIGURE 2  The two alternative models for detecting activity of a given bait-prey set. The left panel, model I, shows a schematic representation of the random firing model where bait alone on the DNA operator site gives rise to some transcription (top). The bait-prey complex gives full transcription (middle) and bait bound to other proteins block for any transcription (bottom). The right panel, Model II, shows the sequestering model. A bait alone gives no transcription (top). A small free bait concentration does not affect production, because the free bait available will be more localized around the DNA since the binding to the DNA operator site is very strong (middle). However, a small free prey concentration will affect production because of the small amount of prey being localized at the right place, since the binding to bait is not strong enough to affect the concentration of prey around the DNA (bottom).

 
Our model I corresponds to the standard explanation for bait-prey bias due to some baits acting as autoactivators—an explanation that at least is right to the extent that a number of proteins are known to activate the reporter gene independent of which prey they are tested with. In large-scale screens these proteins are removed from the data. But other proteins that in themselves weakly activate the reporter might be detected as interaction partners to proteins with which they only interact weakly.

In our investigation we deal with a total of five networks: a), the real network of protein interactions in the cell, henceforth called the "real network"; b), the two-hybrid experiment gives us the "observed network"; c), we create a "simulated network", and on this simulated network we perform the two different simulated two-hybrid experiments just mentioned, d), model I (random firing); and e), model II (sequestering).

The situation is like reconstructing collision events in nuclear or high energy physics on basis of the incomplete data obtained from the detectors. Thus we want to analyze a situation where

Formula 1(1)
for both the real/experimental system, and our simulated/model image.

The results from the two models (see Method section and the Supplementary Material) with different values of the parameters used can be seen in Figs. 3 and 4. In Fig. 3 we have investigated the models where we for simplicity have assumed all proteins in the cell to have the same total concentration. For model I (A) we do not see a clear difference between bait and prey, whereas for model II (B) we see clearly different trends for the two. To see whether this effect would still be present if the protein concentrations of bait and prey were systematically higher than that of other individual proteins, Fig. 4 shows the effects of assuming the total concentrations of the bait and prey to be 10 times that of other proteins. This is relevant because bait and prey are typically expressed on multicopy plasmids. Fig. 4 demonstrate that the systematic differences between bait and prey persists at this more realistic setup, although the effects are less pronounced than for the parameters used in Fig. 3.


Figure 3
View larger version (14K):
[in this window]
[in a new window]

 
FIGURE 3  (A) Shown is the result of the random firing model, with threshold 0.1, assuming an underlying 1/k2.5 connectivity distribution. (B) Shown is the sequestering model with the same parameters. In both cases we investigate the case where bait and prey are at the same concentrations as other proteins in the cell.

 

Figure 4
View larger version (25K):
[in this window]
[in a new window]

 
FIGURE 4  The left panel (A and C) shows the result of the random firing model, assuming an underlying 1/k2.5 distribution. The right panel (B and D) is sequestering with the same parameters. In all figures we investigate the case where bait and prey are at 10 times the concentrations of the other proteins in the cell. The upper panel (A and B) is for threshold 0.1, and the lower panel (C and D) is for threshold 0.2.

 
To test the robustness of our results against the simplified assumption of having equal concentrations of proteins in the cell we also performed simulations where protein concentrations were drawn from a more realistic distribution. We have chosen to use a distribution similar to the distribution experimentally found by Ghaemmaghami et al. (13Go), where the protein concentrations are log-normal distributed. To fit our model we gave the protein concentrations log-normal distributed between 0.01 and 5 with a mean value of ~1. Overall, such variations tend to decrease the difference between bait and prey connectivity. However, for larger values of the detection threshold, T, the findings from above are nicely reproduced. With varying protein concentrations, the random firing model I always fails to obtain noticeable difference between bait and prey connectivity. In contrast the sequestering model II predicts substantial bait-prey asymmetry for any distribution of protein concentrations, provided a sufficiently large threshold is used. In the Supplementary Material we show figures that substantiate the robustness of our conclusions with respect to initial protein concentrations and choice of threshold values.

Of the two hypotheses for the asymmetry we conclude that model II provides the best explanation for the observed features. For example model II is completely consistent with the fact that more proteins act as prey than as bait. We also find that the high connectivities are mostly seen for proteins functioning as bait, an effect not nearly as pronounced in model I. The fact that proteins with prey connectivity kprey = 0 has surprisingly high values of bait connectivity kbait, is also better explained by the sequestering model II. One explanation of this effect could be that if the connectivity of a protein is very large, the free concentration of the protein will be small; see Fig. 2. When a bait protein has a small concentration, we can imagine that because of the very strong binding to the DNA operator site it will be located close to the DNA at all times. For the prey protein, however, this effect does not exist. Here the binding to DNA is only a result of the binding to the bait protein, and this is expected to be a weaker interaction. Proteins with a small free concentration may therefore be seen in the experiment when acting as bait, whereas it will be very difficult for them to be seen binding as prey.

Our approach also opens for analysis of to what extent various network motifs (14Go) may survive given the bait-prey asymmetry. In particular we find that for triangles of three interacting proteins it is particularly difficult to survive this asymmetry, and thus a triangle in itself should be taken as an indication of a more reliable/stronger interaction.

To the extent that model II describes the data, our analysis suggests that one should believe prey data for prey proteins with low connectivity, and bait data for proteins acting as bait with high connectivity. This conclusion would be softened if model I is also contributing. That is, if baits acting as autoactivators contribute to some additional interactions and thereby make some weak interactions detectable for bait proteins that have autoactivating that in itself was below the detection threshold. In any case prey data missed many links associated to high connectivity proteins.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have suggested a possible mechanism for the previously reported (11Go,15Go) difference between behavior of proteins when functioning as bait and when functioning as prey. This asymmetry indicates some basic difference in the biochemistry of the "bait position" on the DNA and the "prey position" in the cell nucleus or cytoplasm. In these terms we indeed found that the sequestering model II explains more of the asymmetry features seen in the two-hybrid data than the random firing model I. Thus the sequestering model explains the nearly exponential distribution of the prey connectivities, that more proteins act as prey than as bait, as well as the effect that some proteins with no bindings as prey have a large number of connections as bait.


    METHOD
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
For further details see Stibius (16Go).

To examine the models we create a "simulated network" (17Go) consisting of N nodes by assuming that their connectivity, k, is power distributed. That is, we assign each node a connectivity drawn from a probability distribution

Formula 2(2)
In practice this is done through the tabulated cumulative distribution function, Formula 2 For each node, i, one chooses a random number {eta} isin [0;1], and selects its connectivity ki = k such that F(k – 1) ≤ {eta} ≤ F(k).

After each node, i, is given a certain connectivity, ki, the nodes are sorted by descending connectivities and subsequently connected into a network as described by Trusina et al. (17Go). Finally the network is randomized by link swapping as described by Maslov and Sneppen (15Go) to generate a truly random connected network with connectivity distribution represented by Eq. 2.

In the models the interest lies in the probability of binding the bait-prey complex to the DNA operator site, because this complex alone is able to activate transcription. The probability of having an operator site, O, with any molecule, X, bound to it can be calculated by Eq. 4, where Formula 2 is the binding constant of the molecule X to the operator site O on the DNA. Thus [XO] is found by

Formula 3(3)
and the probability for the operator site to be occupied by X is:

Formula 4(4)
In the following we shall consider two simple approaches for determining this probability.

Model I
In this, the random firing model, we assume that the only molecules that bind are the ones with a bait protein, i.e., free bait, bait-prey complexes, and bait in complex with other molecules, Y. This means that the total concentration of X will be:

Formula 5(5)
The number of DNA operator sites, O, is considered to be much smaller than the number of molecules, X, in the cell. Therefore we consider the free X concentration to be equal to the total X concentration in the cell nucleus. The probability, Pbait-prey, of seeing the bait-prey complex bound to O is then found by multiplying the fraction of bait prey to the total bait by the probability given by Eq. 4 that a bait molecule is bound to the operator site:

Formula 6(6)
see also Shea and Ackers (18Go). Equation 6 does not give a possibility for any asymmetry between bait and prey. However an asymmetry may arise if a bait bound to O could activate the transcription with a probability below the threshold value.

In the two-hybrid experiment some baits always activate transcription, and these proteins were not used in the final experiment. Thus in model I we make the hypothesis that the bait proteins will have some binding to the RNA-polymerase and thereby activate transcription, but only at a level insufficient for the survival of the cell. This implies that the threshold for seeing a particular bait will depend on the level of activation that bait protein has. In terms of our model this means that the promoter activity associated to the bait-prey complex Pbait-prey is supplemented with an additional activity associated to the bait itself.

We have simulated the extra bait firing by giving each bait a random value, baitfiring (rb isin [0, 0.1]), which is selected from a flat distribution. In the figures in the main text we typically chose threshold T = 0.1, just above the maximal random firing, whereas we in the supplement investigate larger T values. The total activity associated with a given bait-prey complex is then calculated from:

Formula 7(7)
where we again stress that rb is a bait property, and thus independent of the particular prey. Thus baits that are assigned a larger value of rb are relatively easy to detect in complexes.

Model II
In this, the sequestering model, we assume that baits bound to other molecules than prey are unable to bind to the DNA operator site. It could, e.g., be that, Y, bound to the bait is a membrane protein, and the complex therefore is located at the membrane. Thus we will have the molecules that are able to bind to the DNA operator site to be:

Formula 8(8)
From Eqs. 4 and 8 we find the probability of seeing the bait-prey complex:

Formula 9(9)

Calculations
To calculate these probabilities for the two models we need an estimate of the free protein concentrations. These can be found by Eq. 11

Formula 10(10)
giving

Formula 11(11)
where Kij is the binding constant between the proteins pi and pj, and [pipj] is the concentration of the complex formed by binding pi and pj.

In the models we assign binding constants, Kij, between all proteins in the network. Two proteins, i and j, that have a connection in the network are given the same binding constant, Kij = Kbinding. In the model this is given the value, Kbinding = e–5 {approx} 10–2 (the smaller the value the stronger the binding). This binding strength should be seen in the perspective that typical concentrations of individual proteins are of the order of one, thus representing fairly strong interactions.

Proteins, pi and pk, with no connection in the network are given a binding constant, Kik = Kno binding = 108, which is so large that we effectively disregard all bindings not present in the assumed network (no false positives are possible).

Finally we select the binding constant KD = 10–5 reflecting a very strong binding of the GAL4 binding region to the operator site. Other values of the binding constants have been investigated (see Supplementary Material) without altering the conclusions given in the main article.

In the first iteration we have used a value of [pi]free = 0.1 x [pi]total for all free protein concentrations, but the final free protein concentration is independent of which value is used to begin the numerical simulation. The iteration was continued until the value of the free concentration did not vary more than 10–10.

Now the model networks can be formed by calculating the probability of two proteins binding for each combination of proteins thereby creating a N x N matrix of probabilities. This matrix can be converted to a network by applying a threshold, T, where node i as bait is connected to node j as prey if Pij ≥ T, where we have investigated values of T from 0.05 to 0.9; see also Supplementary Material.


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
To view all of the supplemental files associated with this article, visit www.biophysj.org.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
This work was supported by a scholarship from Novo-Nordisk and by the Danish National Research Foundation through the Center for Models of Life.


    FOOTNOTES
 
Editor: Ruth Nussinov.

Submitted on September 27, 2006; accepted for publication May 30, 2007.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS AND DISCUSSION
 CONCLUSION
 METHOD
 SUPPLEMENTARY MATERIAL
 ACKNOWLEDGEMENTS
 REFERENCES
 
1. Gavin, A. C., M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A. M. Michon, C. M. Cruciat, M. Remor, C. Hofert, et al. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 415:141–147.[CrossRef][Medline]

2. Kumar, A., and M. Snyder. 2002. Protein complexes take the bait. Nature. 415:123–124.[CrossRef][Medline]

3. Ho, Y., A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Y. Yang, C. Wolting, et al. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 415:180–183.[CrossRef][Medline]

4. Fields, S., and O. Song. 1989. A novel genetic system to detect protein-protein interactions. Nature. 340:245–246.[CrossRef][Medline]

5. Chien, C., P. Bartel, R. Sternglanz, and S. Fields. 1991. The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. USA. 88:9578–9582.[Abstract/Free Full Text]

6. Ito, T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA. 98:4569–4574.[Abstract/Free Full Text]

7. Ito, T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki. 2000. Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA. 97:1143–1147.[Abstract/Free Full Text]

8. Giot, L., J. S. Bader, C. Brouwer, A. Chaudhuri, B. Kuang, Y. Li, Y. L. Hao, C. E. Ooi, B. Godwin, E. Vitols, G. Vijayadamodar, P. Pochart, et al. 2003. A protein interaction map of Drosophila melanogaster. Science. 302:1727–1736.[Abstract/Free Full Text]

9. Uetz, P., and M. Pankratz. 2004. Protein interaction maps on the fly. Nature. 22:43–44.[CrossRef]

10. Stelzl, U., U. Worm, M. Lalowski, C. Haenig, F. H. Brembeck, H. Goehler, M. Stroedicke, M. Zenkner, A. Schoenherr, S. Koeppen, J. Timm, S. Mintzlaff, et al. 2005. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 122:957–968.[CrossRef][Medline]

11. Aloy, P., and R. Russell. 2002. Potential artefacts in protein-interaction networks. FEBS Lett. 530:253–254.[CrossRef][Medline]

12. Mrowka, R., A. Patzak, and H. Herzel. 2001. Is there a bias in proteome research? Genome Res. 11:1971–1973.[Abstract/Free Full Text]

13. Ghaemmaghami, S., W. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephour, E. K. O'Shea, and J. S. Weissman. 2003. Global analysis of protein expression in yeast. Nature. 425:737–741.[CrossRef][Medline]

14. Shen-Orr, S. S., R. Milo, S. Mangan, and U. Alon. 2002. Network motifs in the transcriptional regulation of Escherichia coli. Nat. Genet. 31:64–68.[CrossRef][Medline]

15. Maslov, S., and K. Sneppen. 2002. Specificity and stability in topology of protein networks. Science. 296:910–913.[Abstract/Free Full Text]

16. Stibius, K. B. 2004. Analysis and modelling of protein interaction networks—a study of the two-hybrid experiment. Master's thesis. Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark. http://cmol.nbi.dk/thesis/Karin.pdf. [Online].

17. Trusina, A., S. Maslov, P. Minnhagen, and K. Sneppen. 2004. Hierarchy and anti-hierarchy in real and scale free networks. Phys Rev Lett. 92:178702.[CrossRef][Medline]

18. Shea, M. A., and G. K. Ackers. 1985. The OR control system of bacteriophage lambda—a physical-chemical model for gene regulation. J. Mol. Biol. 181:211–230.[CrossRef][Medline]




This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
I. Feldman, A. Rzhetsky, and D. Vitkup
Network properties of genes harboring inherited disease mutations
PNAS, March 18, 2008; 105(11): 4323 - 4328.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplement
Right arrow All Versions of this Article:
biophysj.106.098236v1
93/7/2562    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Stibius, K. B.
Right arrow Articles by Sneppen, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stibius, K. B.
Right arrow Articles by Sneppen, K.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2007 by the Biophysical Society.