Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with 
Arbitrary Objective Functions

Bubeck, S; von Luxburg, U

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions

MPG-Autoren

/persons/resource/persons76237

von Luxburg, U
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Externe Ressourcen

http://www.jmlr.org/papers/volume10/bubeck09a/bubeck09a.pdf
(Verlagsversion)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Bubeck, S., & von Luxburg, U. (2009). Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions. The Journal of Machine Learning Research, 10, 657-698.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0013-C591-8

Zusammenfassung

Clustering is often formulated as a discrete optimization problem. The objective is to
find, among all partitions of the data set, the best one according to some quality measure.
However, in the statistical setting where we assume that the finite data set has been sampled
from some underlying space, the goal is not to find the best partition of the given
sample, but to approximate the true partition of the underlying space. We argue that the
discrete optimization approach usually does not achieve this goal, and instead can lead to
inconsistency. We construct examples which provably have this behavior. As in the case
of supervised learning, the cure is to restrict the size of the function classes under consideration.
For appropriate small function classes we can prove very general consistency
theorems for clustering optimization schemes. As one particular algorithm for clustering
with a restricted function space we introduce nearest neighbor clustering. Similar to the
k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general
baseline algorithm to minimize arbitrary clustering objective functions. We prove that it
is statistically consistent for all commonly used clustering objective functions.