Simple Linear Work Suffix Array Construction

Kärkkäinen, Juha; Sanders, Peter; Baeten, Jos C.M.; Lenstra, Jan Karel; Parrow, Joachim; Woeginger, Gerhard J.

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Simple Linear Work Suffix Array Construction

MPS-Authors

/persons/resource/persons44717

Kärkkäinen, Juha
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

/persons/resource/persons45344

Sanders, Peter
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Kärkkäinen, J., & Sanders, P. (2003). Simple Linear Work Suffix Array Construction. In Automata, languages and programming: 30th International Colloquium, ICALP 2003 (pp. 943-955). Berlin, Germany: Springer.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-2E13-5

Abstract

A suffix array represents the suffixes of a string in sorted order. Being a simpler and more compact alternative to suffix trees, it is an important tool for full text indexing and other string processing tasks. We introduce the \emph{skew algorithm} for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine:\\ 1. recursively sort suffixes beginning at positions $i\bmod 3\neq 0$.\\ 2. sort the remaining suffixes using the information obtained in step one.\\ 3. merge the two sorted sequences obtained in steps one and two.\\ The algorithm is much simpler than previous linear time algorithms that are all based on the more complicated suffix tree data structure. Since sorting is a well studied problem, we obtain optimal algorithms for several other models of computation, e.g.\ external memory with parallel disks, cache oblivious, and parallel. The adaptations for BSP and EREW-PRAM are asymptotically faster than the best previously known algorithms.