Help Guide Privacy Policy Disclaimer Contact us
  Advanced SearchBrowse





Algorithmus zur Identifikation von Protein-Komplexen in high-throughput Daten.


Wasinee,  Rungsarityotin
Max Planck Society;

There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available

Wasinee, R. (2007). Algorithmus zur Identifikation von Protein-Komplexen in high-throughput Daten. PhD Thesis, Freie Universität Berlin, Berlin.

Cite as:
Recent advances in proteomic technologies such as two-hybrid and biochemical purification allow large-scale investigations of protein interactions. The goal of this thesis is to investigate modelbased approaches to predict protein complexes from tandem affinity purification experiments. We compare a simple overlapping model to a partitioning model. In addition, we propose a visualization framework to delineate overlapping complexes from experimental data. Previous techniques for protein interaction analysis rely on heuristic algorithms. They yield useful results, but make no attempt to provide a model of protein complexes from experimental data. In addition, heuristic algorithms often have a plethora of adjustable parameters, with very little guidance on how to adjust them for a particular dataset. We believe that model-based techniques provide a more rigorous framework for protein interaction analysis. A probabilistic model explicitly and quantitatively states the assumptions about how protein interactions are exposed by the experimental technique. The actual algorithm then uses the model to compute an estimate of the clustering. We propose two models to predict protein complexes from experimental data. Our first model is in some sense the simplest possible one. It is based on frequent itemset mining, which merely counts the incidence of certain sets of proteins within the experimental results. The affinity of two sets of proteins to form clusters is modeled to be independent, regardless of any overlapping members between these sets. Our second model assumes that formation of protein complexes can be reduced to pairwise interactions between proteins. Interactions between proteins are more likely for pairs of proteins if they come from the same cluster. Based on this model, we use Markov Random Field theory to calculate a maximum-likelihood assignment of proteins to clusters. We compare the effectiveness of the two models by evaluating them against two benchmarks. In our evaluation, the partitioning model performs much better than the overlapping model. This indicates that protein clustering in nature is likely to be a pairwise phenomenon, despite individual examples to the contrary. The performance of the second model is as good as previous techniques based on heuristics, and in contrast to them it has no adjustable parameters, making us confident that it will perform well on a wide range of datasets. Finally, we developed a useful visualization method for tandem affinity experimental data. Purification results are modeled as a directed graph. Edge weights are defined by the inclusion probability between two purifications. This measure captures the asymmetric nature of the bait-prey experiment. We demonstrate the effectiveness of the method by presenting a visualization of the most recent large-scale experiments.