A PAC-Bayesian Approach to Formulation of Clustering Objectives

Seldin, Y; Tishby, N

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

A PAC-Bayesian Approach to Formulation of Clustering Objectives

MPG-Autoren

/persons/resource/persons84206

Seldin, Y
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Externe Ressourcen

http://stanford.edu/~rezab/nips2009workshop/
(Zusammenfassung)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Seldin_Tishby_Clustering_[0].pdf
(beliebiger Volltext), 144KB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Seldin, Y., & Tishby, N. (2009). A PAC-Bayesian Approach to Formulation of Clustering Objectives. In NIPS 2009 Workshop "Clustering: Science or Art? Towards Principled Approaches" (pp. 1-4).

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-C1D6-E

Zusammenfassung

Clustering is a widely used tool for exploratory data analysis. However, the theoretical understanding of clustering is very limited. We still do not have a
well-founded answer to the seemingly simple question of “how many clusters are present in the data?”, and furthermore a formal comparison of clusterings based
on different optimization objectives is far beyond our abilities. The lack of good theoretical support gives rise to multiple heuristics that confuse the practitioners
and stall development of the field. We suggest that the ill-posed nature of clustering problems is caused by the fact that clustering is often taken out of its subsequent application context. We argue that one does not cluster the data just for the sake of clustering it, but rather to
facilitate the solution of some higher level task. By evaluation of the clustering’s contribution to the solution of the higher level task it is possible to compare different
clusterings, even those obtained by different optimization objectives. In the preceding work it was shown that such an approach can be applied to evaluation and design of co-clustering solutions. Here we suggest that this approach can be extended to other settings, where clustering is applied.