A PAC-Bayesian Analysis of Co-clustering, Graph Clustering, and Pairwise 
Clustering

Seldin, Y

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

A PAC-Bayesian Analysis of Co-clustering, Graph Clustering, and Pairwise Clustering

MPG-Autoren

/persons/resource/persons84206

Seldin, Y
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Externe Ressourcen

http://webee.technion.ac.il/people/kirilld/ICML10_SNA_Workshop/description.shtml
(Inhaltsverzeichnis)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Seldin_Social_Analytics_[0].pdf
(beliebiger Volltext), 251KB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Seldin, Y. (2010). A PAC-Bayesian Analysis of Co-clustering, Graph Clustering, and Pairwise Clustering. In ICML 2010 Workshop on Social Analytics: Learning from human interactions (pp. 1-5).

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-BF92-2

Zusammenfassung

We review briefly the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008, 2009, 2010), which provided generalization guarantees and regularization
terms absent in the preceding formulations of this problem and achieved state-of-the-art prediction results in MovieLens collaborative filtering task. Inspired by this analysis we formulate weighted graph clustering1 as a prediction problem:
given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical
comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. Following the lines of (Seldin and Tishby, 2010) we derive PAC-Bayesian generalization bounds
for graph clustering. The bounds show that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve
on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice
(Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite
sample issues.