IO-Top-k: index-access optimized top-k query processing

Bast, Holger; Majumdar, Debapriyo; Schenkel, Ralf; Theobalt, Christian; Weikum, Gerhard

DetailsÜbersicht

IO-Top-k: index-access optimized top-k query processing

Bast, H., Majumdar, D., Schenkel, R., Theobalt, C., & Weikum, G.(2006). IO-Top-k: index-access optimized top-k query processing (MPI-I-2006-5-002). Saarbrücken: Max-Planck-Institut für Informatik.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0014-6716-E Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0014-7892-F

Genre: Bericht

Dateien

einblenden: Dateien

ausblenden: Dateien

:

MPI-I-2006-5-002.pdf (beliebiger Volltext), 335KB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/11858/00-001M-0000-0014-6718-A

Name:
MPI-I-2006-5-002.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Bast, Holger¹, Autor
Majumdar, Debapriyo¹, Autor
Schenkel, Ralf², Autor
Theobalt, Christian³, Autor
Weikum, Gerhard², Autor

Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
3Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates. The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. We also discuss efficient implementation techniques for the underlying data structures. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods: a factor of up to 3 in terms of execution costs, and a factor of 5 in terms of absolute run-times of our implementation. Our best techniques are close to a lower bound for the execution cost of the considered class of threshold algorithms.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erschienen: 2006

Publikationsstatus: Erschienen

Seiten: 49 p.

Ort, Verlag, Ausgabe: Saarbrücken : Max-Planck-Institut für Informatik

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: URI: http://domino.mpi-inf.mpg.de/internet/reports.nsf/NumberView/2006-5-002
Reportnr.: MPI-I-2006-5-002
BibTex Citekey: BastMajumdarSchenkelTheobaldWeikum2006

Art des Abschluß: -

ausblenden:

Titel: Research Report / Max-Planck-Institut für Informatik

Genre der Quelle: Reihe

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: -

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: - Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1