hide
Free keywords:
-
Abstract:
Suffix arrays are a simple and powerful data structure for text processing
that can be used for full text indexes, data compression, and many
other applications in particular in bioinformatics.
However, so far it looked prohibitive to build suffix arrays
for huge inputs that do not fit into main memory.
This paper presents design, analysis, implementation, and
experimental evaluation of
several new and improved algorithms for suffix array construction.
The algorithms are asymptotically optimal in the worst case
or on the average. Our implementation can construct
suffix arrays for inputs of up to 4GByte in hours
on a low cost machine where
all previous implementations we are aware of would fail or take days.
We also present a simple and efficient external algorithm for checking
whether an array of indexes is a suffix array.
As a tool of possible independent interest we present a systematic way
to design, analyze, and implement \emph{pipelined}
algorithms.