hide
Free keywords:
-
Abstract:
We consider the following autocompletion search scenario: imagine a user of a
search engine typing a query; then with every keystroke display those
completions of the last query word that would lead to the best hits, and also
display the best such hits. The following problem is at the core of this
feature: for a fixed document collection, given a set D of documents, and an
alphabetical range W of words, compute the set of all word-in-document pairs (w
,d) from the collection such that w ∈W and d∈D. We present a new data structure
with the help of which such autocompletion queries can be processed, on the
average, in time linear in the input plus output size, independent of the size
of the underlying document collection. At the same time, our data structure
uses no more space than an inverted index. Actual query processing times on a
large test collection correlate almost perfectly with our theoretical bound.