English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Conference Paper

Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force

MPS-Authors
/persons/resource/persons44113

Bender,  Matthias
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45041

Michel,  Sebastian
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45636

Triantafillou,  Peter
Databases and Information Systems, MPI for Informatics, Max Planck Society;

/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Bender, M., Michel, S., Triantafillou, P., & Weikum, G. (2007). Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force. In LSDS-IR: 1st Workshop on Large-Scale Distributed (pp. 16-22).: n/a.


Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-1ED9-8
Abstract
Indexing the Web and meeting the throughput, response-time, and failure-resilience requirements of a search engine requires massive storage and computational resources and a careful system design for scalability. This is exemplified by the big data centers of the leading commercial search engines. Various proposals and debates have appeared in the literature as to whether Web indexes can be implemented in a fully distributed or even peer-to-peer manner without impeding scalability, and different partitioning strategies have been worked out. In this paper, we resume this ongoing discussion by analyzing the design space for distributed Web indexing, considering the influence of partitioning strategies as well as different storage technologies including Flash-RAM. We outline and discuss the pros and cons of three fundamental alternatives, and characterize their total costs for meeting all performance and availability requirements. We give arguments in favor of a system design based on term partitioning over a DHT-based peer-to-peer network with modern top-k query processing and a judiciously designed combination of disk and Flash-RAM storage, and we show that this design has intriguing properties and a very attractive cost/performance ratio.