Optimizing multi-dimensional MPI communications on multi-core architectures
dc.contributor.advisor | Chen, Zizhong | |
dc.contributor.author | Karlsson, Christer | |
dc.date.accessioned | 2007-01-03T08:21:16Z | |
dc.date.accessioned | 2022-02-03T11:54:23Z | |
dc.date.available | 2007-01-03T08:21:16Z | |
dc.date.available | 2022-02-03T11:54:23Z | |
dc.date.issued | 2012 | |
dc.date.submitted | 2012 | |
dc.identifier | T 7093 | |
dc.identifier.uri | https://hdl.handle.net/11124/70678 | |
dc.description | 2012 Fall. | |
dc.description | Includes illustrations (some color). | |
dc.description | Includes bibliographical references. | |
dc.description.abstract | In today's high performance computing, many Message Passing Interface (MPI) programs (e.g., ScaLAPACK applications, High Performance Linpack Benchmark (HPL), and most PDE solvers based on domain decomposition methods) organize their computational processes as multidimensional Cartesian grids. Applications often need to communicate in every dimension of the Cartesian grid. While extensive optimizations have been performed on single dimensional communications such as the standard MPI collective communications, little work has been done to optimize multidimensional communications. We study the impact of the MPI process-to-core mapping on the performance of multidimensional MPI communications on Cartesian grid. While the default process-to-core mappings in today's state-of-the-art MPI implementations are often optimal for single dimensional communications, we show that they are often sub-optimal for multidimensional communications. We propose an application-level multicore-aware process-to-core re-mapping scheme that is capable of achieving optimal performance for multidimensional communication operations. The application-level solution does not require any changes to the MPI's implementations; the optimization will occur in the application layer. Experiments demonstrate that a multicore-aware process-to-core re-mapping scheme improves the performance of multidimensional MPI communications by up to 80% over the default mapping scheme on the world's current third fastest supercomputer, Jaguar, located at the Oak Ridge National Laboratory. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado School of Mines. Arthur Lakes Library | |
dc.relation.ispartof | 2012 - Mines Theses & Dissertations | |
dc.rights | Copyright of the original work is retained by the author. | |
dc.subject | process-to-core mapping | |
dc.subject | multicore | |
dc.subject | Cartesian topology | |
dc.subject | cluster | |
dc.subject | collective communication | |
dc.subject | Message Passing Interface (MPI) | |
dc.title | Optimizing multi-dimensional MPI communications on multi-core architectures | |
dc.type | Text | |
dc.contributor.committeemember | Han, Qi | |
dc.contributor.committeemember | Mehta, Dinesh P. | |
dc.contributor.committeemember | Munoz, David (David R.) | |
dc.contributor.committeemember | Skokan, C. K. | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) | |
thesis.degree.level | Doctoral | |
thesis.degree.discipline | Electrical Engineering and Computer Science | |
thesis.degree.grantor | Colorado School of Mines |