Efficient Temporal Keyword Queries over Versioned Text

Anand, Avishek; Bedathur, Srikanta; Berberich, Klaus; Schenkel, Ralf

doi:10.1145/1871437.1871528

Local TagsRelease HistoryDetailsSummary

Efficient Temporal Keyword Queries over Versioned Text

Anand, A., Bedathur, S., Berberich, K., & Schenkel, R. (2010). Efficient Temporal Keyword Queries over Versioned Text. In X. J. Huang, G. Jones, N. Koudas, X. Wu, & K. Collins-Thompson (Eds.), Proceedings of the 19th ACM Conference on Information and Knowledge Management (pp. 699-708). New York, NY: ACM. doi:10.1145/1871437.1871528.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-14E4-B Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0024-33F7-2

Genre: Conference Paper

Files

show Files

Locators

show

Creators

show

hide

Creators:
Anand, Avishek¹, Author
Bedathur, Srikanta¹, Author
Berberich, Klaus¹, Author
Schenkel, Ralf¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

show

hide

Free keywords: -

Abstract: Modern text analytics applications operate on large volumes of temporal text data such as Web archives, newspaper archives, blogs, wikis, and micro-blogs. In these settings, searching and mining needs to use constraints on the time dimension in addition to keyword constraints. A natural approach to address such queries is using an inverted index whose entries are enriched with valid-time intervals. It has been shown that these indexes have to be partitioned along time in order to achieve efficiency. However, when the temporal predicate corresponds to a long time range, requiring the processing of multiple partitions, naive query processing incurs high cost of reading of redundant entries across partitions. We present a framework for efficient approximate processing of keyword queries over a temporally partitioned inverted index which minimizes this overhead, thus speeding up query processing. By using a small synopsis for each partition we identify partitions that maximize the number of final non-redundant results, and schedule them for processing early on. Our approach aims to balance the estimated gains in the final result recall against the cost of index reading required. We present practical algorithms for the resulting optimization problem of index partition selection. Our experiments with three diverse, large-scale text archives reveal that our proposed approach can provide close to 80\% result recall even when only about half the index is allowed to be read.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2010Date issued: 2010

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: eDoc: 536378
DOI: 10.1145/1871437.1871528
URI: http://doi.acm.org/10.1145/1871437.1871528
Other: Local-ID: C1256DBF005F876D-63EEA22E6EFA1620C12577840044D36B-AnandBBS_CIKM10

Degree: -

Event

show

hide

Title: 19th ACM Conference on Information and Knowledge Management

Place of Event: Toronto, Canada

Start-/End Date: 2010-10-26 - 2010-10-30

Legal Case

show

Project information

show

Source 1

show

hide

Title: Proceedings of the 19th ACM Conference on Information and Knowledge Management

Abbreviation : CIKM 2010

Source Genre: Proceedings

Creator(s):
Huang, Xiangji Jimmy¹, Editor
Jones, Gareth¹, Editor
Koudas, Nick¹, Editor
Wu, Xindong¹, Editor
Collins-Thompson, Kevyn¹, Editor

Affiliations:
1 External Organizations, ou_persistent22

Publ. Info: New York, NY : ACM

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 699 - 708 Identifier: ISBN: 978-1-4503-0099-5