Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

 
 
DownloadE-Mail
  IO-Top-k: Index-Access Optimized Top-k Query Processing

Bast, H., Majumdar, D., Schenkel, R., Theobald, M., & Weikum, G. (2006). IO-Top-k: Index-Access Optimized Top-k Query Processing. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006 (pp. 475-486). New York, USA: ACM.

Item is

Dateien

einblenden: Dateien
ausblenden: Dateien
:
BastMSTW2006a.pdf (beliebiger Volltext), 655KB
 
Datei-Permalink:
-
Name:
BastMSTW2006a.pdf
Beschreibung:
-
OA-Status:
Sichtbarkeit:
Privat
MIME-Typ / Prüfsumme:
application/pdf
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-
Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Bast, Holger1, Autor           
Majumdar, Debapriyo1, Autor           
Schenkel, Ralf2, Autor           
Theobald, Martin2, Autor           
Weikum, Gerhard2, Autor           
Dayal, Umeshwar, Herausgeber
Whang, Kyu-Young, Herausgeber
Lomet, David B., Herausgeber
Alonso, Gustavo, Herausgeber
Lohman, Guy M., Herausgeber
Kersten, Martin L., Herausgeber
Cha, Sang Kyun, Herausgeber
Kim, Young-Kuk, Herausgeber
Affiliations:
1Algorithms and Complexity, MPI for Informatics, Max Planck Society, ou_24019              
2Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Top-$k$ query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-$k$ queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates. The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods.

Details

einblenden:
ausblenden:
Sprache(n): eng - English
 Datum: 2007-04-162006
 Publikationsstatus: Erschienen
 Seiten: -
 Ort, Verlag, Ausgabe: New York, USA : ACM
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: eDoc: 314405
Anderer: Local-ID: C1256428004B93B8-A4369BF160083CD6C1257188004AA310-BastMSTW2006a
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: Untitled Event
Veranstaltungsort: Seoul, Korea
Start-/Enddatum: 2006-09-12

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006
Genre der Quelle: Konferenzband
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: New York, USA : ACM
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 475 - 486 Identifikator: ISBN: 1-59593-385-9