ausblenden:
Schlagwörter:
-
Zusammenfassung:
Recent advances in proteomic technologies such as two-hybrid and biochemical purification allow
large-scale investigations of protein interactions. The goal of this thesis is to investigate modelbased
approaches to predict protein complexes from tandem affinity purification experiments. We
compare a simple overlapping model to a partitioning model. In addition, we propose a visualization
framework to delineate overlapping complexes from experimental data.
Previous techniques for protein interaction analysis rely on heuristic algorithms. They yield
useful results, but make no attempt to provide a model of protein complexes from experimental
data. In addition, heuristic algorithms often have a plethora of adjustable parameters, with very
little guidance on how to adjust them for a particular dataset. We believe that model-based techniques
provide a more rigorous framework for protein interaction analysis. A probabilistic model
explicitly and quantitatively states the assumptions about how protein interactions are exposed by
the experimental technique. The actual algorithm then uses the model to compute an estimate of the
clustering.
We propose two models to predict protein complexes from experimental data. Our first model is
in some sense the simplest possible one. It is based on frequent itemset mining, which merely counts
the incidence of certain sets of proteins within the experimental results. The affinity of two sets of
proteins to form clusters is modeled to be independent, regardless of any overlapping members
between these sets. Our second model assumes that formation of protein complexes can be reduced
to pairwise interactions between proteins. Interactions between proteins are more likely for pairs of
proteins if they come from the same cluster. Based on this model, we use Markov Random Field
theory to calculate a maximum-likelihood assignment of proteins to clusters.
We compare the effectiveness of the two models by evaluating them against two benchmarks.
In our evaluation, the partitioning model performs much better than the overlapping model. This
indicates that protein clustering in nature is likely to be a pairwise phenomenon, despite individual
examples to the contrary. The performance of the second model is as good as previous techniques
based on heuristics, and in contrast to them it has no adjustable parameters, making us confident
that it will perform well on a wide range of datasets.
Finally, we developed a useful visualization method for tandem affinity experimental data. Purification
results are modeled as a directed graph. Edge weights are defined by the inclusion probability
between two purifications. This measure captures the asymmetric nature of the bait-prey
experiment. We demonstrate the effectiveness of the method by presenting a visualization of the
most recent large-scale experiments.