Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing

Theobald, Martin; Schenkel, Ralf; Weikum, Gerhard; Baeza-Yates, Ricardo A.; Ziviani, Nivio; Marchionini, Gary; Moffat, Alistair; Tait, John

Local TagsRelease HistoryDetailsSummary

Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing

Theobald, M., Schenkel, R., & Weikum, G. (2005). Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing. In 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005) (pp. 242-249). New York, USA: ACM.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-2659-5 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-265A-3

Genre: Conference Paper

Files

show Files

hide Files

:

TheobaldSTW.pdf (Any fulltext), 521KB

File Permalink:
-

Name:
TheobaldSTW.pdf

Description:
-

OA-Status:

Visibility:
Private

MIME-Type / Checksum:
application/pdf

Technical Metadata:

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Theobald, Martin¹, Author
Schenkel, Ralf¹, Author
Weikum, Gerhard¹, Author
Baeza-Yates, Ricardo A., Editor
Ziviani, Nivio, Editor
Marchionini, Gary, Editor
Moffat, Alistair, Editor
Tait, John, Editor

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

show

hide

Free keywords: -

Abstract: We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion terms whose thematic similarity to the original query terms is above some specified threshold, thus generating a disjunctive query with much higher dimensionality. This poses three major problems: 1) the need for hand-tuning the expansion threshold, 2) the potential topic dilution with overly aggressive expansion, and 3) the drastically increased execution cost of a high-dimensional query. The method developed in this paper addresses all three problems by dynamically and incrementally merging the inverted lists for the potential expansion terms with the lists for the original query terms. A priority queue is used for maintaining result candidates, the pruning of candidates is based on Fagin's family of top-k algorithms, and optionally probabilistic estimators of candidate scores can be used for additional pruning. Experiments on the TREC collections for the 2004 Robust and Terabyte tracks demonstrate the increased efficiency, effectiveness, and scalability of our approach.

Details

show

hide

Language(s): eng - English

Dates: Modified: 2006-04-14Date issued: 2005

Publication Status: Issued

Pages: -

Publishing info: New York, USA : ACM

Table of Contents: -

Rev. Type: -

Identifiers: eDoc: 278885
Other: Local-ID: C1256DBF005F876D-5AD4A59C97D8DFFAC1256FE0002DD302-TheobaldSW05

Degree: -

Event

show

hide

Title: Untitled Event

Place of Event: Salvador, Brazil

Start-/End Date: 2005-08-15

Legal Case

show

Project information

show

Source 1

show

hide

Title: 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005)

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: New York, USA : ACM

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 242 - 249 Identifier: ISBN: 1-59593-034-5