hide
Free keywords:
-
Abstract:
We applied PAC-Bayesian framework to derive gen-
eralization bounds for co-clustering1. The analysis
yielded regularization terms that were absent in the
preceding formulations of this task. The bounds sug-
gested that co-clustering should optimize a trade-off
between its empirical performance and the mutual in-
formation that the cluster variables preserve on row
and column indices. Proper regularization enabled
us to achieve state-of-the-art results in prediction of
the missing ratings in the MovieLens collaborative
filtering dataset.
In addition a PAC-Bayesian bound for discrete den-
sity estimation was derived. We have shown that
the PAC-Bayesian bound for classification is a spe-
cial case of the PAC-Bayesian bound for discrete den-
sity estimation. We further introduced combinatorial
priors to PAC-Bayesian analysis. The combinatorial
priors are more appropriate for discrete domains, as
opposed to Gaussian priors, the latter of which are
suitable for continuous domains. It was shown that
combinatorial priors lead to regularization terms in
the form of mutual information.