hide
Free keywords:
-
Abstract:
Assume we are given a sample of points from some underlying
distribution which contains several distinct clusters. Our goal is
to construct a neighborhood graph on the sample points such that
clusters are ``identifiedlsquo;lsquo;: that is, the subgraph induced by points
from the same cluster is connected, while subgraphs corresponding to
different clusters are not connected to each other. We derive bounds
on the probability that cluster identification is successful, and
use them to predict ``optimallsquo;lsquo; values of k for the mutual and
symmetric k-nearest-neighbor graphs. We point out different
properties of the mutual and symmetric nearest-neighbor graphs
related to the cluster identification problem.