hide
Free keywords:
-
Abstract:
Query languages for XML such as XPath or XQuery support Boolean retrieval: a
query result is a (possibly restructured) subset of XML elements or entire
documents that satisfy the search conditions of the query. This search paradigm
works for highly schematic XML data collections such as electronic catalogs.
However, for searching information in open environments such as the Web or
intranets of large corporations, ranked retrieval is more appropriate: a query
result is a ranked list of XML elements in descending order of (estimated)
relevance. Web search engines, which are based on the ranked retrieval
paradigmdo, however, not consider the additional information and rich
annotations provided by the structure of XML documents and their element names.
This article presents the XXL search engine that supports relevance ranking on
XML data. XXL is particularly geared for path queries with wildcards that can
span multiple XML collections and contain both exact-match as well as
semantic-similarity search conditions. In addition, ontological information and
suitable index structures are used to improve the search efficiency and
effectiveness. XXL is fully implemented as a suite of Java classes and
servlets. Experiments in the context of the INEX benchmark demonstrate the
efficiency of the XXL search engine and underline its effectiveness for ranked
retrieval.