hide
Free keywords:
-
Abstract:
This paper, which came out of the Bubba project at MCC, was the first to
address the physical database design problem for parallel database servers,
with particular focus on the partitioning and allocation of (relational) data
across multiple disks or processing nodes. These issues are key to good
performance tuning. To this end, the paper introduced the fundamental notion of
data heat as a measure for the disk access load attributed to a data unit or
collection of units, and the notion of temperature to normalize heat by the
consumed space. Based on these metrics, the paper developed an elegant
framework and heuristic algorithms for choosing which data should be placed on
which disk so as to balance the disk load, and which data should be cached in
memory so as to minimize the overall disk load.
I had the great opportunity of spending a postdoc year in the Bubba group at
MCC where I could learn about this subject directly from the paper's authors.
Later, their work was my main inspiration when I started working on dynamic
data placement and migration in the early nineties. In this research of mine
the notions of heat and temperature proved to be extremely useful for reasoning
about load distribution and for developing algorithms that continuously adjust
the allocation of data based on online statistics about access patterns, for
example, to "cool down" hot disks. I have also seen fairly recent papers on the
caching of query results in data warehouses to benefit greatly from the Bubba
tuning framework. The paper by Copeland et al. is a true landmark paper,
especially when you consider that this work was done before the industrial
advent of parallel database systems. The problem of automating the physical
database design for a cluster-based parallel data server, in the spirit of a
zero-admin, self-tuning solution, has still not been solved in a truly
comprehensive, industrial-strength manner, but this seminal paper is an
excellent starting point and absolutely mandatory reading for everybody working
on this highly relevant problem.