English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Review - Data Placement In Bubba

Weikum, G. (2000). Review - Data Placement In Bubba. ACM SIGMOD Digital Review, 2.

Item is

Files

show Files

Locators

show

Creators

show
hide
 Creators:
Weikum, Gerhard1, Author           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              

Content

show
hide
Free keywords: -
 Abstract: This paper, which came out of the Bubba project at MCC, was the first to address the physical database design problem for parallel database servers, with particular focus on the partitioning and allocation of (relational) data across multiple disks or processing nodes. These issues are key to good performance tuning. To this end, the paper introduced the fundamental notion of data heat as a measure for the disk access load attributed to a data unit or collection of units, and the notion of temperature to normalize heat by the consumed space. Based on these metrics, the paper developed an elegant framework and heuristic algorithms for choosing which data should be placed on which disk so as to balance the disk load, and which data should be cached in memory so as to minimize the overall disk load. I had the great opportunity of spending a postdoc year in the Bubba group at MCC where I could learn about this subject directly from the paper's authors. Later, their work was my main inspiration when I started working on dynamic data placement and migration in the early nineties. In this research of mine the notions of heat and temperature proved to be extremely useful for reasoning about load distribution and for developing algorithms that continuously adjust the allocation of data based on online statistics about access patterns, for example, to "cool down" hot disks. I have also seen fairly recent papers on the caching of query results in data warehouses to benefit greatly from the Bubba tuning framework. The paper by Copeland et al. is a true landmark paper, especially when you consider that this work was done before the industrial advent of parallel database systems. The problem of automating the physical database design for a cluster-based parallel data server, in the spirit of a zero-admin, self-tuning solution, has still not been solved in a truly comprehensive, industrial-strength manner, but this seminal paper is an excellent starting point and absolutely mandatory reading for everybody working on this highly relevant problem.

Details

show
hide
Language(s): eng - English
 Dates: 2006-03-272000
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: eDoc: 520331
Other: Local-ID: C1256DBF005F876D-F09277A94B842FABC125713E004EA396-Weikum00
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: ACM SIGMOD Digital Review
Source Genre: Journal
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 2 Sequence Number: - Start / End Page: - Identifier: ISBN: -