Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

EverLast: A Distributed Architecture for Preserving the Web

MPG-Autoren
/persons/resource/persons44012

Anand,  Avishek
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44104

Bedathur,  Srikanta
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons44119

Berberich,  Klaus
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45639

Tryfonopoulos,  Christos
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Anand, A., Bedathur, S., Berberich, K., Schenkel, R., & Tryfonopoulos, C. (2009). EverLast: A Distributed Architecture for Preserving the Web. In Proceedings of the Joint Conference on Digital Libraries (pp. 331-340). New York, NY: ACM.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-1910-0
Zusammenfassung
The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an globalscale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast , a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a looselycoupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.