A theoretical and experimental study on the construction of suffix arrays in 
external memory

Crauser, Andreas; Ferragina, Paolo

A theoretical and experimental study on the construction of suffix arrays in external memory

Crauser, A., & Ferragina, P. (2002). A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica, 32, 1-35.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-000F-2F13-E 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-000F-2F14-C

資料種別: 学術論文

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Crauser, Andreas¹, 著者
Ferragina, Paolo¹, 著者

所属:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019

内容説明

表示:

非表示:

キーワード: -

要旨: The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [Manber-Myers,~1993] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/O-complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are the state-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, Amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space vs. construction-time tradeoff. Given the current trends in model design~\cite{Farach-et-al,Vitter} and disk technology~\cite{quantum,Ruemmler-Wilkes}, we will pose particular attention to differentiate between ``random'' and ``contiguous'' disk accesses, in order to reasonably explain some practical I/O-phenomena which are related to the experimental behavior of these algorithms and that would be otherwise meaningless in the light of other simpler external-memory models. At the best of our knowledge, this is the first study which provides a wide spectrum of possible approaches to the construction of suffix arrays in external memory, and thus it should be helpful to anyone who is interested in building full-text indexes on very large text collections. Finally, we conclude our paper by addressing two other issues. The former concerns with the problem of building word-indexes; we show that our results can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming so to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive ``contradiction'' between the effective practical performance of the well-known BaezaYates-Gonnet-Snider's algorithm~\cite{book-info}, verified in our experiments, and its unappealing (i.e., cubic) worst-case behavior. We devise a new external-memory algorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance.

資料詳細

表示:

非表示:

言語: eng - English

日付: 修正: 2003-09-08出版: 2002

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: 査読あり

識別子（DOI, ISBNなど）: eDoc: 201785
その他: Local-ID: C1256428004B93B8-1FE0191E40BE5BBBC12569FC0056CAD2-Crauser-Ferragina2001

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Algorithmica

種別: 学術雑誌

著者・編者:

所属:

出版社, 出版地: -

ページ: - 巻号: 32 通巻号: - 開始・終了ページ: 1 - 35 識別子（ISBN, ISSN, DOIなど）: ISSN: 0178-4617

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1