q-gram Based Database Searching Using a Suffix Array (QUASAR)

Burkhardt, Stefan; Crauser, Andreas; Ferragina, Paolo; Lenhof, Hans-Peter; Rivals, Eric; Vingron, Martin

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

q-gram Based Database Searching Using a Suffix Array (QUASAR)

MPG-Autoren

/persons/resource/persons44209

Burkhardt, Stefan
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44266

Crauser, Andreas
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44412

Ferragina, Paolo
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44909

Lenhof, Hans-Peter
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45299

Rivals, Eric
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Burkhardt, S., Crauser, A., Ferragina, P., Lenhof, H.-P., Rivals, E., & Vingron, M. (1999). q-gram Based Database Searching Using a Suffix Array (QUASAR). In S. Istrail, P. Pevzner, & M. Waterman (Eds.), Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB-99) (pp. 77-83). New York, USA: ACM.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-3606-7

Zusammenfassung

With the increasing amount of DNA sequence information deposited in our databases searching for similarity to a query sequence has become a basic operation in molecular biology. But even todays fast algorithms reach their limits when applied to all-versus-all comparisons of large databases. Here we present a new data base searching algorithm dubbed QUASAR (Q-gram Alignment based on Suffix ARrays) which was designed to quickly detect sequences with strong similarity to the query in a context where many searches are conducted on one database. Our algorithm applies a modification of $q$-tuple filtering implemented on top of a suffix array. Two versions were developed, one for a RAM resident suffix array and one for access to the suffix array on disk. We compared our implementation with BLAST and found that our approach is an order of magnitude faster. It is, however, restricted to the search for strongly similar DNA sequences as is typically required, e.g., in the context of clustering expressed sequence tags (ESTs).