非表示:
キーワード:
-
要旨:
We introduce the \emph{synonymy graph} as a new angle of looking at spectral
retrieval techniques, including latent semantic indexing (LSI) and its many
successors. The synonymy graph is defined for each pair of terms in the
collection, and our findings suggest that it is at the heart of what makes
spectral retrieval work in practice.
%
We show that LSI and many of its variants can be equivalently viewed as a
particular document expansion (not query expansion) process, where each term
effects the insertion of some other term if and only if the synonymy graph for
that term pair has a certain characteristic shape. We provide a simple,
parameterless algorithm for detecting that shape.
%
We point out inherent problems of every algorithm that bases its expansion
decisions merely on individual values of the synonymy graph, as done by almost
all existing methods. Our new algorithm overcomes these limitations, and it
consistently outperforms previous methods on a number of test collections.
%
Our synonymy graphs also shed light on the effectiveness of three fundamental
types of variations of the basic LSI scheme.