hide
Free keywords:
-
Abstract:
Principal component analysis (PCA) has been extensively applied in
data mining, pattern recognition and information retrieval for
unsupervised dimensionality reduction. When labels of data are
available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a
semi-supervised setting. In this paper we propose a supervised PCA
model called SPPCA and a semi-supervised PCA model called S^2PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into
the projection phase, and can naturally handle multiple outputs
(i.e.,~in multi-task learning problems). We derive an efficient EM
learning algorithm for both models, and also provide theoretical
justifications of the model behaviors. SPPCA and S^2PPCA are
compared with other supervised projection methods on various
learning tasks, and show not only promising performance but also
good scalability.