hide
Free keywords:
-
Abstract:
RDF is a data representation format for schema-free structured information
that is gaining momentum in the context of Semantic-Web corpora,
life sciences, and also Web 2.0 platforms.
The ``pay-as-you-go'' nature of RDF and the flexible pattern-matching
capabilities of its query language SPARQL entail efficiency and scalability
challenges for complex queries including long join paths.
This paper presents the RDF-3X engine, an implementation of SPARQL that
achieves excellent performance by pursuing a RISC-style architecture
with a streamlined architecture and carefully designed, puristic
data structures and operations.
The salient points of RDF-3X are: 1) a generic solution for storing and indexing
RDF triples that completely eliminates the need for physical-design tuning,
2) a powerful yet simple query processor that leverages fast merge joins to the
largest possible extent, and 3) a query optimizer for choosing optimal join
orders
using a cost model based on statistical synopses for entire join paths.
The performance of RDF-3X, in comparison to the previously best
state-of-the-art systems,
has been measured on several large-scale datasets with more than 50 million RDF
triples and benchmark queries that include pattern matching and long join paths
in the underlying data graphs.