Efficient Time-Travel on Versioned Text Collections

Berberich, Klaus; Bedathur, Srikanta; Weikum, Gerhard

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Konferenzbeitrag

Efficient Time-Travel on Versioned Text Collections

MPG-Autoren

/persons/resource/persons44119

Berberich, Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44104

Bedathur, Srikanta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Berberich, K., Bedathur, S., & Weikum, G. (2007). Efficient Time-Travel on Versioned Text Collections. In A. Kemper, H. Schöning, T. Rose, M. Jarke, T. Seidl, C. Quix, et al. (Eds.), Datenbanksysteme in Business, Technologie und Web (BTW): 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (pp. 44-63). Bonn, Germany: Gesellschaft für Informatik.

Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-1F09-3

Zusammenfassung

The availability of versioned text collections such as the Internet Archive opens up opportunities for time-aware exploration of their contents. In this paper, we propose \emph{time-travel retrieval and ranking} that extends traditional keyword queries with a temporal context in which the query should be evaluated. More precisely, the query is evaluated over all states of the collection that existed during the temporal context. In order to support these queries, we make key contributions in (i) defining extensions to well-known relevance models that take into account the temporal context of the query and the version history of documents, (ii) designing an \emph{immortal index} over the full versioned text collection that avoids a blowup in index size, and (iii) making the popular {NRA} algorithm for top-$k$ query processing aware of the temporal context. We present preliminary experimental analysis over the English Wikipedia revision history showing that the proposed techniques are both effective and efficient.