English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Efficient Text Proximity Search

MPS-Authors
/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44188

Broschart,  Andreas
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

/persons/resource/persons44664

Hwang,  Seungwon
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45609

Theobald,  Martin
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Schenkel, R., Broschart, A., Hwang, S., Theobald, M., & Weikum, G. (2007). Efficient Text Proximity Search. In N. Ziviani, & R. A. Baeza-Yates (Eds.), String Processing and Information Retrieval: 14th International Symposium, SPIRE 2007 (pp. 287-299). Berlin, Germany: Springer.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1F05-B
Abstract
In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality.