Using node and socket information to implement MPI Cartesian topologies

WD Gropp�- Parallel Computing, 2019 - Elsevier
… However, intranode communication performance is much greater than internode communication
performance. In this paper, we show a simple approach that takes into account only …

Analyzing Clustered Latent Dirichlet Allocation

C Gropp - 2016 - search.proquest.com
… Low-power low-complexity wireless loop technology in small base units can be integrated …
personal communications services (PCS) to small, lightweight, low-power personal voice and/…

Using node information to implement MPI Cartesian topologies

WD Gropp�- Proceedings of the 25th European MPI Users' Group�…, 2018 - dl.acm.org
… of this can be seen in Figure 1, which shows how the common assignment of MPI processes
consecutively on a node can lead to significantly greater internode communication than a …

Reducing communication in algebraic multigrid with multi-step node aware communication

A Bienz, WD Gropp, LN Olson�- The International Journal of�…, 2020 - journals.sagepub.com
William D Gropp is the director and chief scientist of the National Center for Supercomputing
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …

MPI 3 and beyond: why MPI is successful and what challenges it faces

W Gropp�- European MPI Users' Group Meeting, 2012 - Springer
… describing the data to be moved, even if noncontiguous, in the MPI communication
routines. This can eliminate a extra copy performed by the user into a separate buffer (unfortunately, …

Improving performance models for irregular point-to-point communication

A Bienz, WD Gropp, LN Olson�- Proceedings of the 25th European MPI�…, 2018 - dl.acm.org
can accurately estimate the cost of communication, but at a significantly increased cost [12,
16, 17]. Network contention has been previously modeled for collective communication, …

Modeling MPI communication performance on SMP nodes: Is it time to retire the ping pong test

W Gropp, LN Olson, P Samfass�- Proceedings of the 23rd European MPI�…, 2016 - dl.acm.org
The "postal" model of communication [3, 8] T = α + βn, for sending n bytes of data between
two processes with latency α and bandwidth 1/β, is perhaps the most commonly used …

Collective algorithms for multiported torus networks

P Sack, W Gropp�- ACM Transactions on Parallel Computing (TOPC), 2015 - dl.acm.org
can send six messages and receive six messages at the same time. Communication algorithms
that take advantage of this cancan be up to six-fold quicker than its generic counterpart. …

Modeling the performance of an algebraic multigrid cycle on HPC platforms

…, M Schulz, UM Yang, KE Jordan, W Gropp�- Proceedings of the�…, 2011 - dl.acm.org
Now that the performance of individual cores has plateaued, future supercomputers will depend
upon increasing parallelism for performance. Processor counts are now in the hundreds …

Exploring the feasibility of lossy compression for pde simulations

…, LN Olson, M Snir, WD Gropp�- …�Journal of High�…, 2019 - journals.sagepub.com
William D Gropp is a director and chief scientist of the National Center for Supercomputing
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …