hide
Free keywords:
-
Abstract:
A natural representation of data are the parameters which generated the data. If the parameter space is continuous we can regard it as a manifold. In practice we usually do not know this manifold but we just
have some representation of the data, often in a very high-dimensional feature space. Since the number of internal parameters does not
change with the representation, the data will effectively lie on a low-dimensional submanifold in feature space. Due to measurement errors this data is usually corrupted by noise which particularly in high-dimensional feature spaces makes it almost impossible to find the manifold structure.
This paper reviews a method called Manifold Denoising which projects
the data onto the submanifold using a diffusion process on a graph generated by the data. We will demonstrate
that the method is capable of dealing with non-trival high-dimensional noise. Moreover we will show that using
the method as a preprocessing step one can significantly improve the results of a semi-supervised learning algorithm.