English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Temporal Shingling for Version Identification in Web Archives

Schenkel, R. (2010). Temporal Shingling for Version Identification in Web Archives. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, et al. (Eds.), Advances in Information Retrieval (pp. 508-519). Berlin: Springer. doi:10.1007/978-3-642-12275-0_44.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Schenkel, Ralf1, Author           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: Building and preserving archives of the evolving Web has been an important problem in research. Given the huge volume of content that is added or updated daily, identifying the right versions of pages to store in the archive is an important building block of any large-scale archival system. This paper presents temporal shingling, an extension of the well-established shingling technique for measuring how similar two snapshots of a page are. This novel method considers the lifespan of shingles to differentiate between important updates that should be archived and transient changes that may be ignored. Extensive experiments demonstrate the tradeoff between archive size and version coverage, and show that the novel method yields better archive coverage at smaller sizes than existing techniques.

Details

show
hide
Language(s): eng - English
 Dates: 20102010
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: eDoc: 536353
DOI: 10.1007/978-3-642-12275-0_44
URI: http://dx.doi.org/10.1007/978-3-642-12275-0_44
Other: Local-ID: C1256DBF005F876D-9D56FBDBC4384840C1257678001FDD44-SchenkelECIR2010
 Degree: -

Event

show
hide
Title: 32nd European Conference on IR Research
Place of Event: Milton Keynes, UK
Start-/End Date: 2010-03-28 - 2010-03-31

Legal Case

show

Project information

show

Source 1

show
hide
Title: Advances in Information Retrieval
  Subtitle : 32nd European Conference on IR Research, ECIR 2010
  Abbreviation : ECIR 2010
Source Genre: Proceedings
 Creator(s):
Gurrin, Cathal1, Editor
He, Yulan1, Editor
Kazai, Gabriella1, Editor
Kruschwitz, Udo1, Editor
Little, Suzanne1, Editor
Roelleke, Thomas1, Editor
Rüger, Stefan1, Editor
van Rijsbergen, Keith1, Editor
Affiliations:
1 External Organizations, ou_persistent22            
Publ. Info: Berlin : Springer
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 508 - 519 Identifier: ISBN: 978-3-642-12274-3

Source 2

show
hide
Title: Lecture Notes in Computer Science
  Abbreviation : LNCS
Source Genre: Series
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 5993 Sequence Number: - Start / End Page: - Identifier: -