Semi-Supervised Protein Classification using Cluster Kernels

Weston, J; Leslie, C; Zhou, D; Elisseeff, A; Noble, WS

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Semi-Supervised Protein Classification using Cluster Kernels

MPS-Authors

/persons/resource/persons84311

Weston, J
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons84330

Zhou, D
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons83901

Elisseeff, A
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://papers.nips.cc/paper/2496-semi-supervised-protein-classification-using-cluster-kernels.pdf
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Weston, J., Leslie, C., Zhou, D., Elisseeff, A., & Noble, W. (2004). Semi-Supervised Protein Classification using Cluster Kernels. Advances in Neural Information Processing Systems 16, 595-602.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-D8FD-C

Abstract

A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data --- examples with known 3D structures, organized into structural classes --- while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency.