hide
Free keywords:
-
Abstract:
With the growing popularity of information retrieval (IR)
in distributed systems and in particular {P2P} Web search, a
huge number of protocols and prototypes have been introduced
in the literature. However, nearly every paper considers
a different benchmark for its experimental evaluation,
rendering their mutual comparison and the quantification of
performance improvements an impossible task.
We present a standardized, general purpose benchmark
for {P2P IR} systems that finally makes this possible. We
start by presenting a detailed requirement analysis for such
a standardized benchmark framework that allows for reproducible
and comparable experimental setups without sacrificing
flexibility to suit different system models. We further
suggest Wikipedia as a publicly-available and all-purpose
document corpus and finally introduce a simple but yet flexible
clustering strategy that assigns the Wikipedia articles as
documents to an arbitrary number of peers. After proposing
a standardized, real-world query set as the benchmark
workload, we review the metrics to evaluate the benchmark
results and present an example benchmark run for our fullyimplemented
{P2P} Web search prototype {MINERVA}.