hide
Free keywords:
-
Abstract:
Despite many years of research on how to properly align sequences in
the presence of sequencing errors, alternative splicing and
micro-exons, the correct alignment of mRNA sequences to genomic DNA is
still a challenging task. We present a novel approach based on large
margin learning that combines kernel based splice site predictions
with common sequence alignment techniques. By solving a convex
optimization problem, our algorithm -- called PALMA -- tunes the
parameters of the model such that the true alignment scores higher
than all other alignments. In an experimental study on the alignments
of mRNAs containing artificially generated micro-exons, we show that
our algorithm drastically outperforms all other methods: It perfectly
aligns all 4358 sequences on an hold-out set, while the best other
method misaligns at least 90 of them. Moreover, our algorithm is very
robust against noise in the query sequence: when deleting, inserting,
or mutating up to 50 of the query sequence, it still aligns 95 of
all sequences correctly, while other methods achieve less than 36
accuracy. For datasets, additional results and a stand-alone
alignment tool see
http://www.fml.mpg.de/raetsch/projects/palma.