de.mpg.escidoc.pubman.appbase.FacesBean
English
 
Help Guide Disclaimer Contact us Login
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Thesis

Deterministische Simulation einer PRAM auf Gitterrechnern

MPS-Authors
http://pubman.mpdl.mpg.de/cone/persons/resource/persons45038

Meyer,  Ulrich
Algorithms and Complexity, MPI for Informatics, Max Planck Society;

Locator
There are no locators available
Fulltext (public)
There are no public fulltexts available
Supplementary Material (public)
There is no public supplementary material available
Citation

Meyer, U. (1995). Deterministische Simulation einer PRAM auf Gitterrechnern. Master Thesis, Universität des Saarlandes, Saarbrücken.


Cite as: http://hdl.handle.net/11858/00-001M-0000-0014-AC76-A
Abstract
The Parallel Random Access Machine, \de{PRAM}, is the dominant theoretical parallel computer model. It consists of a number of processing units, \de{PU}s, which operate synchronously, and which can access a shared memory in constant time. Unfortunately, this high-level model is hardly realizable in hardware using current technology. One possibility is to directly design algorithms for the distributed memory computer, \de{DMC}, under consideration, but this involves many hairy details, and is not portable at all. More practical, is to develop for every DMC a library of basic algorithms (sorting, matrix multiplication, list ranking, \dots), a PRAM simulator, and a `compiler'. In this way, the programmer can program in a high-level language, which is essential for the acceptance of parallel computing. In this paper, we sketch our experiences with a deterministic PRAM simulator for two-dimensional meshes. The success of a PRAM simulator depends on the achieved speed-up. We made considerable progress. Presently, the simulation of one step of a PRAM with $65536 = 16 \cdot 64^2$ PUs, on a $64 \times 64$ mesh, takes about 12000 routing steps, if all 65536 PRAM PUs simultaneously make an access. The time depends on the precise access pattern, and CRCW steps are slightly more expensive than EREW steps. Larger speed-up is achieved if the mesh is larger or if more PRAM PUs are simulated on every mesh PU. It may appear disappointing that we do not achieve substantially better. However, on an $n \times n$ mesh, we can never expect to achieve a speed-up larger than $n / (c + o(n))$, for some constant $c$. In our case $c$ is about 12. Our techniques are directly applicable to higher dimensional meshes. There the speed-up is larger. Furthermore, in our conception, the PRAM simulator should only be used for those operations that are not available in the library of DMC algorithms. So, a moderate speed-up for them, does not necessarily impair the speed-up of the whole program. We developed a small PRAM language which allows to almost directly simulate the basic PRAM algorithms presented in textbooks. It was used to simulate a constant time algorithm for computing the maximum \cite[Sec. 2.6.1]{Ja}, and for the standard logarithmic time summation algorithm. The summation of 65536 numbers requires 33 PRAM steps, which can be simulated on a $36 \times 36$ in 55158 routing steps with maximal queue size~$126$. In order to be able to analyze the details of the PRAM simulation, we have performed it on a mesh simulator. Actually, we developed a complete programming environment, containing many basic mesh operations: several variants of routing and sorting algorithms, ranking, etc. All these algorithms have been optimized, in order to get competitive results for the moderate mesh sizes under consideration.