Exploiting graph-structured data in generative probabilistic models

Dietz, Laura

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

Exploiting graph-structured data in generative probabilistic models

MPS-Authors

/persons/resource/persons44319

Dietz, Laura
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Dietz, L. (2011). Exploiting graph-structured data in generative probabilistic models. PhD Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0010-11A2-3

Abstract

Unsupervised machine learning aims to make predictions when labeled data is absent, and thus, supervised machine learning cannot be applied. These algorithms build on assumptions about how data and predictions relate to each other. One technique for unsupervised problem settings are generative models, which specify the set of assumptions as a probabilistic process that generates the data. The subject of this thesis is how to most effectively exploit input data that has an underlying graph structure in unsupervised learning for three important use cases. The first use case deals with localizing defective code regions in software, given the execution graph of code lines and transitions. Citation networks are exploited in the next use case to quantify the influence of citations on the content of the citing publication. In the final use case, shared tastes of friends in a social network are identified, enabling the prediction of items from a user a particular friend of his would be interested in. For each use case, prediction performance is evaluated via held-out test data that is only scarcely available in the domain. This comparison quantifies under which circumstances each generative model best exploits the given graph structure.