Index Maintenance for Time-Travel Text Search

Anand, Avishek; Bedathur, Srikanta; Berberich, Klaus; Schenkel, Ralf

doi:10.1145/2348283.2348318

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Index Maintenance for Time-Travel Text Search

MPS-Authors

/persons/resource/persons44012

Anand, Avishek
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

/persons/resource/persons44104

Bedathur, Srikanta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44119

Berberich, Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel, Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Anand, A., Bedathur, S., Berberich, K., & Schenkel, R. (2012). Index Maintenance for Time-Travel Text Search. In J. Callan, W. Hersh, Y. Maarek, & M. Sanderson (Eds.), SIGIR'12 (pp. 235-244). New York, NY: ACM.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0014-59CF-5

Abstract

Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to effciently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that effciently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.