Review - Data Placement In Bubba

Weikum, Gerhard

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Review - Data Placement In Bubba

MPS-Authors

/persons/resource/persons45720

Weikum, Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Weikum, G. (2000). Review - Data Placement In Bubba. ACM SIGMOD Digital Review, 2.

Cite as: https://hdl.handle.net/11858/00-001M-0000-000F-352A-0

Abstract

This paper, which came out of the Bubba project at MCC, was the first to address the physical database design problem for parallel database servers, with particular focus on the partitioning and allocation of (relational) data across multiple disks or processing nodes. These issues are key to good performance tuning. To this end, the paper introduced the fundamental notion of data heat as a measure for the disk access load attributed to a data unit or collection of units, and the notion of temperature to normalize heat by the consumed space. Based on these metrics, the paper developed an elegant framework and heuristic algorithms for choosing which data should be placed on which disk so as to balance the disk load, and which data should be cached in memory so as to minimize the overall disk load. I had the great opportunity of spending a postdoc year in the Bubba group at MCC where I could learn about this subject directly from the paper's authors. Later, their work was my main inspiration when I started working on dynamic data placement and migration in the early nineties. In this research of mine the notions of heat and temperature proved to be extremely useful for reasoning about load distribution and for developing algorithms that continuously adjust the allocation of data based on online statistics about access patterns, for example, to "cool down" hot disks. I have also seen fairly recent papers on the caching of query results in data warehouses to benefit greatly from the Bubba tuning framework. The paper by Copeland et al. is a true landmark paper, especially when you consider that this work was done before the industrial advent of parallel database systems. The problem of automating the physical database design for a cluster-based parallel data server, in the spirit of a zero-admin, self-tuning solution, has still not been solved in a truly comprehensive, industrial-strength manner, but this seminal paper is an excellent starting point and absolutely mandatory reading for everybody working on this highly relevant problem.