ausblenden:
Schlagwörter:
-
Zusammenfassung:
A compelling application of peer-to-peer (P2P) system
technology would be distributed Web search, where each
peer autonomously runs a search engine on a personalized
local corpus (e.g., built from a thematically focused
Web crawl) and peers collaborate by routing queries to remote
peers that can contribute many or particularly good
results for these specific queries. Such systems typically
rely on a decentralized directory, e.g., built on top of a
distributed hash table (DHT), that holds compact, aggregated
statistical metadata about the peers which is used
to identify promising peers for a particular query. To support
an a-priori unlimited number of peers, it is crucial to
keep the load on the distributed directory low. Moreover,
each peer should ideally tailor its postings to the directory
to reflect its particular strengths, such as rich information
about specialized topics that no or only few other
peers would also cover. This paper addresses this problem
by proposing strategies for peers that identify suitable
subsets of the most beneficial statistical metadata. We argue
that posting a carefully selected subset of metadata can
achieve almost the same result quality as a complete metadata
directory, for only the most relevant peers are eventually
involved in the execution of a given query. Additionally,
asking only relevant peers will result in higher precision, as
the noise introduced by poor peers is reduced. We have implemented
these strategies in our fully operational P2P Web
search prototype Minerva, and present experimental results
on real-world Web data that show the viability of the strategies
and their gains in terms of high search result quality at
low networking costs.