Loading...
Optimizing multi-dimensional MPI communications on multi-core architectures
Karlsson, Christer
Karlsson, Christer
Citations
Altmetric:
Advisor
Editor
Date
Date Issued
2012
Date Submitted
2012
Collections
Research Projects
Organizational Units
Journal Issue
Embargo Expires
Abstract
In today's high performance computing, many Message Passing Interface (MPI) programs (e.g., ScaLAPACK applications, High Performance Linpack Benchmark (HPL), and most PDE solvers based on domain decomposition methods) organize their computational processes as multidimensional Cartesian grids. Applications often need to communicate in every dimension of the Cartesian grid. While extensive optimizations have been performed on single dimensional communications such as the standard MPI collective communications, little work has been done to optimize multidimensional communications. We study the impact of the MPI process-to-core mapping on the performance of multidimensional MPI communications on Cartesian grid. While the default process-to-core mappings in today's state-of-the-art MPI implementations are often optimal for single dimensional communications, we show that they are often sub-optimal for multidimensional communications. We propose an application-level multicore-aware process-to-core re-mapping scheme that is capable of achieving optimal performance for multidimensional communication operations. The application-level solution does not require any changes to the MPI's implementations; the optimization will occur in the application layer. Experiments demonstrate that a multicore-aware process-to-core re-mapping scheme improves the performance of multidimensional MPI communications by up to 80% over the default mapping scheme on the world's current third fastest supercomputer, Jaguar, located at the Oak Ridge National Laboratory.
Associated Publications
Rights
Copyright of the original work is retained by the author.