非表示:
キーワード:
-
要旨:
We present a novel approach for efficient and self-tuning query expansion that
is embedded into a top-k query processor with candidate pruning. Traditional
query expansion methods select expansion terms whose thematic similarity to the
original query terms is above some specified threshold, thus generating a
disjunctive query with much higher dimensionality. This poses three major
problems: 1) the need for hand-tuning the expansion threshold, 2) the potential
topic dilution with overly aggressive expansion, and 3) the drastically
increased execution cost of a high-dimensional query. The method developed in
this paper addresses all three problems by dynamically and incrementally
merging the inverted lists for the potential expansion terms with the lists for
the original query terms. A priority queue is used for maintaining result
candidates, the pruning of candidates is based on Fagin's family of top-k
algorithms, and optionally probabilistic estimators of candidate scores can be
used for additional pruning. Experiments on the TREC collections for the 2004
Robust and Terabyte tracks demonstrate the increased efficiency, effectiveness,
and scalability of our approach.