A dual communicator and dual grid-resolution algorithm for petascale 
simulations of turbulent mixing at high Schmidt number

Clay, M. P.; Buaria, Dhawal; Gotoh, T.; Yeung, P. K.

doi:10.1016/j.cpc.2017.06.009

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

このアイテムの新しいバージョンが利用可能です:
https://pure.mpg.de/pubman/item/item_2476983_3

詳細要約

公開

学術論文

A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number

MPS-Authors

/persons/resource/persons209092

Buaria, Dhawal
Laboratory for Fluid Dynamics, Pattern Formation and Biocomplexity, Max Planck Institute for Dynamics and Self-Organization, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Clay, M. P., Buaria, D., Gotoh, T., & Yeung, P. K. (2017). A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number. Computer Physics Communications, 219, 313-328. doi:10.1016/j.cpc.2017.06.009.

引用: https://hdl.handle.net/11858/00-001M-0000-002D-E195-8

要旨

A new dual-communicator algorithm with very favorable performance characteristics has been developed for direct numerical simulation (DNS) of turbulent mixing of a passive scalar governed by an advection-diffusion equation. We focus on the regime of high Schmidt number (Sc), where because of low molecular diffusivity the grid-resolution requirements for the scalar field are stricter than those for the velocity field by a factor root Sc. Computational throughput is improved by simulating the velocity field on a coarse grid of N-v(3) points with a Fourier pseudo-spectral (FPS) method, while the passive scalar is simulated on a fine grid of N-theta(3) points with a combined compact finite difference (CCD) scheme which computes first and second derivatives at eighth-order accuracy. A static three-dimensional domain decomposition and a parallel solution algorithm for the CCD scheme are used to avoid the heavy communication cost of memory transposes. A kernel is used to evaluate several approaches to optimize the performance of the CCD routines, which account for 60% of the overall simulation cost. On the petascale supercomputer Blue Waters at the University of Illinois, Urbana-Champaign, scalability is improved substantially with a hybrid MPI-OpenMP approach in which a dedicated thread per NUMA domain overlaps communication calls with computational tasks performed by a separate team of threads spawned using OpenMP nested parallelism. At a target production problem size of 8192(3) (0.5 trillion) grid points on 262,144 cores, CCD timings are reduced by 34% compared to a pure-MPI implementation. Timings for 16384(3) (4 trillion) grid points on 524,288 cores encouragingly maintain scalability greater than 90%, although the wall clock time is too high for production runs at this size. Performance monitoring with CrayPat for problem sizes up to 4096(3) shows that the CCD routines can achieve nearly 6% of the peak flop rate. The new DNS code is built upon two existing FPS and CCD codes. With the grid ratio N-theta/N-v = 8, the disparity in the computational requirements for the velocity and scalar problems is addressed by splitting the global communicator MPI_COMM_WORLD into disjoint communicators for the velocity and scalar fields, respectively. Inter communicator transfer of the velocity field from the velocity communicator to the scalar communicator is handled with discrete send and non-blocking receive calls, which are overlapped with other operations on the scalar communicator. For production simulations at N-theta = 8192 and N-v = 1024 on 262,144 cores for the scalar field, the DNS code achieves 94% strong scaling relative to 65,536 cores and 92% weak scaling relative to N-theta = 1024 and Nv = 128 on 512 cores.