Using node and socket information to implement MPI Cartesian topologies
WD Gropp�- Parallel Computing, 2019 - Elsevier
… However, intranode communication performance is much greater than internode communication
performance. In this paper, we show a simple approach that takes into account only …
performance. In this paper, we show a simple approach that takes into account only …
Analyzing Clustered Latent Dirichlet Allocation
C Gropp - 2016 - search.proquest.com
… Low-power low-complexity wireless loop technology in small base units can be integrated …
personal communications services (PCS) to small, lightweight, low-power personal voice and/…
personal communications services (PCS) to small, lightweight, low-power personal voice and/…
Using node information to implement MPI Cartesian topologies
WD Gropp�- Proceedings of the 25th European MPI Users' Group�…, 2018 - dl.acm.org
… of this can be seen in Figure 1, which shows how the common assignment of MPI processes
consecutively on a node can lead to significantly greater internode communication than a …
consecutively on a node can lead to significantly greater internode communication than a …
Reducing communication in algebraic multigrid with multi-step node aware communication
… William D Gropp is the director and chief scientist of the National Center for Supercomputing
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …
MPI 3 and beyond: why MPI is successful and what challenges it faces
W Gropp�- European MPI Users' Group Meeting, 2012 - Springer
… describing the data to be moved, even if noncontiguous, in the MPI communication
routines. This can eliminate a extra copy performed by the user into a separate buffer (unfortunately, …
routines. This can eliminate a extra copy performed by the user into a separate buffer (unfortunately, …
Improving performance models for irregular point-to-point communication
… can accurately estimate the cost of communication, but at a significantly increased cost [12,
16, 17]. Network contention has been previously modeled for collective communication, …
16, 17]. Network contention has been previously modeled for collective communication, …
Modeling MPI communication performance on SMP nodes: Is it time to retire the ping pong test
The "postal" model of communication [3, 8] T = α + βn, for sending n bytes of data between
two processes with latency α and bandwidth 1/β, is perhaps the most commonly used …
two processes with latency α and bandwidth 1/β, is perhaps the most commonly used …
Collective algorithms for multiported torus networks
P Sack, W Gropp�- ACM Transactions on Parallel Computing (TOPC), 2015 - dl.acm.org
… can send six messages and receive six messages at the same time. Communication algorithms
that take advantage of this can … can be up to six-fold quicker than its generic counterpart. …
that take advantage of this can … can be up to six-fold quicker than its generic counterpart. …
Modeling the performance of an algebraic multigrid cycle on HPC platforms
Now that the performance of individual cores has plateaued, future supercomputers will depend
upon increasing parallelism for performance. Processor counts are now in the hundreds …
upon increasing parallelism for performance. Processor counts are now in the hundreds …
Exploring the feasibility of lossy compression for pde simulations
… William D Gropp is a director and chief scientist of the National Center for Supercomputing
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …
Applications and holds the Thomas M. Siebel Chair in the Department of Computer Science …