English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Large Scale Genomic Sequence SVM Classifiers

MPS-Authors
/persons/resource/persons84153

Rätsch,  G
Friedrich Miescher Laboratory, Max Planck Society;

/persons/resource/persons84193

Schölkopf,  B
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

SonRaeSch05b.pdf
(Any fulltext), 309KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Sonnenburg, S., Rätsch, G., & Schölkopf, B. (2005). Large Scale Genomic Sequence SVM Classifiers. In S. Dzeroski, L. de Raedt, & S. Wrobel (Eds.), ICML '05: the 22nd international conference on Machine learning (pp. 848-855). New York, NY, USA: ACM Press.


Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-D6E1-7
Abstract
In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.