hide
Free keywords:
-
Abstract:
For the design and analysis of algorithms that process huge data sets, a
machine model is needed that handles parallel disks. There seems to be
a dilemma between simple and flexible use of such a model and accurate
modelling of details of the hardware. This paper explains how many
aspects of this problem can be resolved. The programming model
implements one large logical disk allowing concurrent access to
arbitrary sets of variable size blocks. This model can
be implemented efficienctly on multiple
independent disks even if zones with different speed,
communication bottlenecks
and failed disks are allowed. These results not only provide useful
algorithmic tools but also imply a theoretical justification for
studying external memory algorithms using simple abstract models.
The algorithmic approach is random redundant placement of data and
optimal scheduling of accesses. The analysis generalizes a previous
analysis for simple abstract external memory models in several ways
(Higher efficiency, variable block sizes, more detailed disk model).
As a side effect, an apparently new Chernoff-Hoeffding bound for sums of
weighted 0-1 random variables is derived.