Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Konferenzbeitrag

Cache Oblivious Parallelograms in Iterative Stencil Computations

MPG-Autoren
/persons/resource/persons45566

Strzodka,  Robert
Computer Graphics, MPI for Informatics, Max Planck Society;
Graphics - Optics - Vision, MPI for Informatics, Max Planck Society;

/persons/resource/persons45463

Shaheen,  Mohammed
Computer Graphics, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

/persons/resource/persons45154

Pajak,  Dawid
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45449

Seidel,  Hans-Peter
Computer Graphics, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Strzodka, R., Shaheen, M., Pajak, D., & Seidel, H.-P. (2010). Cache Oblivious Parallelograms in Iterative Stencil Computations. In ICS'10 (pp. 49-59). New York, NY: ACM. doi:10.1145/1810085.1810096.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-000F-1742-0
Zusammenfassung
We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results. The performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.