English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Better External Memory Suffix Array Construction

MPS-Authors
/persons/resource/persons44293

Dementiev,  Roman
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons44717

Kärkkäinen,  Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45022

Mehnert,  Jens
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45344

Sanders,  Peter
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Dementiev, R., Kärkkäinen, J., Mehnert, J., & Sanders, P. (2005). Better External Memory Suffix Array Construction. In Proceedings of the Seventh Workshop on Algorithm Engineering and Experiments and the Second Workshop on Analytic Algorithmics and Combinatorics (ALENEX/ANALCO 2005) (pp. 86-97). Philadelphia, USA: SIAM.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-25F0-6
Abstract
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. However, so far it looked prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on the average. Our implementation can construct suffix arrays for inputs of up to 4GByte in hours on a low cost machine where all previous implementations we are aware of would fail or take days. We also present a simple and efficient external algorithm for checking whether an array of indexes is a suffix array. As a tool of possible independent interest we present a systematic way to design, analyze, and implement \emph{pipelined} algorithms.