Efficient Temporal Keyword Queries over Versioned Text

Anand, Avishek; Bedathur, Srikanta; Berberich, Klaus; Schenkel, Ralf

doi:10.1145/1871437.1871528

Efficient Temporal Keyword Queries over Versioned Text

Anand, A., Bedathur, S., Berberich, K., & Schenkel, R. (2010). Efficient Temporal Keyword Queries over Versioned Text. In X. J., Huang, G., Jones, N., Koudas, X., Wu, & K., Collins-Thompson (Eds.), Proceedings of the 19th ACM Conference on Information and Knowledge Management (pp. 699-708). New York, NY: ACM. doi:10.1145/1871437.1871528.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/11858/00-001M-0000-000F-14E4-B 版のパーマリンク: https://hdl.handle.net/11858/00-001M-0000-0024-33F7-2

資料種別: 会議論文

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Anand, Avishek¹, 著者
Bedathur, Srikanta¹, 著者
Berberich, Klaus¹, 著者
Schenkel, Ralf¹, 著者

所属:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

内容説明

表示:

非表示:

キーワード: -

要旨: Modern text analytics applications operate on large volumes of temporal text data such as Web archives, newspaper archives, blogs, wikis, and micro-blogs. In these settings, searching and mining needs to use constraints on the time dimension in addition to keyword constraints. A natural approach to address such queries is using an inverted index whose entries are enriched with valid-time intervals. It has been shown that these indexes have to be partitioned along time in order to achieve efficiency. However, when the temporal predicate corresponds to a long time range, requiring the processing of multiple partitions, naive query processing incurs high cost of reading of redundant entries across partitions. We present a framework for efficient approximate processing of keyword queries over a temporally partitioned inverted index which minimizes this overhead, thus speeding up query processing. By using a small synopsis for each partition we identify partitions that maximize the number of final non-redundant results, and schedule them for processing early on. Our approach aims to balance the estimated gains in the final result recall against the cost of index reading required. We present practical algorithms for the resulting optimization problem of index partition selection. Our experiments with three diverse, large-scale text archives reveal that our proposed approach can provide close to 80\% result recall even when only about half the index is allowed to be read.

資料詳細

表示:

非表示:

言語: eng - English

日付: オンライン出版: 2010出版: 2010

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: eDoc: 536378
DOI: 10.1145/1871437.1871528
URI: http://doi.acm.org/10.1145/1871437.1871528
その他: Local-ID: C1256DBF005F876D-63EEA22E6EFA1620C12577840044D36B-AnandBBS_CIKM10

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Proceedings of the 19th ACM Conference on Information and Knowledge Management

省略形 : CIKM 2010

種別: 会議論文集

著者・編者:
Huang, Xiangji Jimmy¹, 編集者
Jones, Gareth¹, 編集者
Koudas, Nick¹, 編集者
Wu, Xindong¹, 編集者
Collins-Thompson, Kevyn¹, 編集者

所属:
1 External Organizations, ou_persistent22

出版社, 出版地: New York, NY : ACM

ページ: - 巻号: - 通巻号: - 開始・終了ページ: 699 - 708 識別子（ISBN, ISSN, DOIなど）: ISBN: 978-1-4503-0099-5

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1