Cross-validation Optimization for Large Scale Structured Classification Kernel 
Methods

Seeger, M

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Cross-validation Optimization for Large Scale Structured Classification Kernel Methods

MPS-Authors

/persons/resource/persons84205

Seeger, M
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

http://jmlr.csail.mit.edu/papers/volume9/seeger08b/seeger08b.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Seeger, M. (2008). Cross-validation Optimization for Large Scale Structured Classification Kernel Methods. The Journal of Machine Learning Research, 9, 1147-1178.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C8C5-9

Abstract

We propose a highly efficient framework for penalized likelihood kernel methods applied
to multi-class models with a large, structured set of classes. As opposed to many previous
approaches which try to decompose the fitting problem into many smaller ones, we focus
on a Newton optimization of the complete model, making use of model structure and
linear conjugate gradients in order to approximate Newton search directions. Crucially,
our learning method is based entirely on matrix-vector multiplication primitives with the
kernel matrices and their derivatives, allowing straightforward specialization to new kernels,
and focusing code optimization efforts to these primitives only.
Kernel parameters are learned automatically, by maximizing the cross-validation log
likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate
our approach on large scale text classification tasks with hierarchical structure on
thousands of classes, achieving state-of-the-art results in an order of magnitude less time
than previous work.