Global Document Frequency Estimation in Peer-to-Peer Web Search

Bender, Matthias; Michel, Sebastian; Triantafillou, Peter; Weikum, Gerhard; Zhou, Dayou

Lokale TagsFreigabegeschichteDetailsÜbersicht

Global Document Frequency Estimation in Peer-to-Peer Web Search

Bender, M., Michel, S., Triantafillou, P., & Weikum, G. (2006). Global Document Frequency Estimation in Peer-to-Peer Web Search. In 9th International Workshop on the Web and Databases (WebDB 2006) @ SIGMOD2006 (pp. 69-74). n/a: n/a.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-230A-D Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-000F-230B-B

Genre: Konferenzbeitrag

Dateien

einblenden: Dateien

ausblenden: Dateien

:

WebDB06.pdf (beliebiger Volltext), 216KB

Datei-Permalink:
-

Name:
WebDB06.pdf

Beschreibung:
-

OA-Status:

Sichtbarkeit:
Privat

MIME-Typ / Prüfsumme:
application/pdf

Technische Metadaten:

Copyright Datum:
-

Copyright Info:
-

Lizenz:
-

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Bender, Matthias¹, Autor
Michel, Sebastian¹, Autor
Triantafillou, Peter¹, Autor
Weikum, Gerhard¹, Autor
Zhou, Dayou, Herausgeber

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Information retrieval (IR) in peer-to-peer (P2P) networks, where the corpus is spread across many loosely coupled peers, has recently gained importance. In contrast to IR systems on a centralized server or server farm, P2P IR faces the additional challenge of either being oblivious to global corpus statistics or having to compute the global measures from local statistics at the individual peers in an efficient, distributed manner. One specific measure of interest is the global document frequency for different terms, which would be very beneficial as term-specific weights in the scoring and ranking of merged search results that have been obtained from different peers. This paper presents an efficient solution for the problem of estimating global document frequencies in a large-scale P2P network with very high dynamics where peers can join and leave the network on short notice. In particular, the developed method takes into account the fact that the local document collections of autonomous peers may arbitrarily overlap, so that global counting needs to be duplicate-insensitive. The method is based on hash sketches as a technique for compact data synopses. Experimental studies demonstrate the estimator's accuracy, scalability, and ability to cope with high dynamics. Moreover, the benefit for ranking P2P search results is shown by experiments with real-world Web data and queries.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Geändert: 2007-04-27Erschienen: 2006

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: n/a : n/a

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: eDoc: 314463
Anderer: Local-ID: C1256DBF005F876D-1074E8517E3FAF65C12571B8004C7FA1-WebDB06

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: Untitled Event

Veranstaltungsort: Chicago, USA

Start-/Enddatum: 2006-05-30

ausblenden:

Titel: 9th International Workshop on the Web and Databases (WebDB 2006) @ SIGMOD2006

Genre der Quelle: Konferenzband

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: n/a : n/a

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 69 - 74 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1