非表示:
キーワード:
-
要旨:
In this paper we address the query routing problem in peer-to-peer ({P2P})
information retrieval. Our system builds up on the idea of a {S}emantic
{O}verlay {N}etwork ({SON}), in which each peer becomes neighbor of a small
number of peers, chosen among those that are most similar to it. Peers in the
network are represented by a statistical Language Model derived from their
local data collections but, instead of using the non-metric Kullback-Leibler
divergence to compute the similarity between them, we use a symmetrized and
"metricized" related measure, the square root of the Jensen-Shannon divergence,
which let us map the problem to a metric search problem. The search strategy
exploits the triangular inequality to efficiently prune the search space and
relies on a priority queue to visit the most promising peers first. To keep
communications costs low and to perform an efficient comparison between
Language Models, we devise a compression technique that builds on Bloom-filters
and histograms and we provide error bounds for the approximation and a cost
analysis for the algorithms used to build and maintain the {SON}.