The Chroma package provides a toolbox of executables that carry out lattice Quantum Chromodynamics calculations (LQCD). Chroma is built on top of the QDP++ (QCD Data Parallel Layer) which provides an abstract data parallel view of the lattice and provides lattice wide types and expressions, using expression templates, to allow straightforward encoding of LQCD equations.
Chroma code can be built using the QUDA Library for Lattice QCD on GPU supported systems which includes an implementation of Adaptive Aggregation Multigrid. This provides highly optimized solvers for GPU LQCD. Chroma can be built over the regular QDP++ package or over a version of QDP++ called QDP-JIT. QDP-JIT is especially useful for gauge-field generation, in which expression templates are used to generate code dynamically using the JIT library of the LLVM Compiler infrastructure.
Parallelizing over multiple GPUs is done by utilizing the QMP library which provides a thin abstraction over MPI for most QCD oriented communications.
Chroma spends a considerable portion of its runtime in MPI-based, GPU-to-GPU communications. Users can expect better performance from this container if they ensure that such GPU communication occurs along the strongest available GPU links. NVIDIA Topology-Aware GPU Selection (NVTAGS) intelligently and automatically assigns GPUs to MPI processes, thereby reducing overall GPU-to-GPU communication time. It is recommended that users use NVTAGS with Chroma when running on a multi-GPU system with asymmetric GPU communication channels, such as a DGX-1 where both NVLink and QPI can be used for GPU-to-GPU communication.
Before running the NGC CHROMA container please ensure your system meets the following requirements.
The following examples demonstrate how to run the NGC CHROMA container under the supported runtimes.
This example will set the QUDA_RESOURCE_PATH
which will perform autotuning of its kernels in the linear solvers, the QUDA library will save its kernel tuning parameters there in a file called tunecache.tsv
and reuse them in future runs.
For this example, we must download szscl_bench.zip which contains anisotropic lattice of (24³ * 128) sites and an example input XML file called test.ini.xml
.
This test problem will fit (using QDP-JIT) onto two Pascal Generation (P100) and up to latest Ampere Generation (A100) GPUs. Once downloaded unzip szscl_bench.zip
and
change to the top level bench
directory.
unzip szscl_bench.zip
cd szscl_bench
Set the resource path for Tunecache files
export QUDA_RESOURCE_PATH=$PWD
export GPU_COUNT=2
docker run -v $PWD:/workspace -it --rm --gpus all --privileged nvcr.io/hpc/chroma:YYYY.MM mpirun --allow-run-as-root -x ${QUDA_RESOURCE_PATH} -n ${GPU_COUNT} chroma -i ./test.ini.xml -geom 1 1 1 ${GPU_COUNT} -ptxdb ./qdpdb -gpudirect
Depending on your system and Chroma workload, default GPU selection may be inefficient and a better GPU assignment may be possible. NVTAGS can perform this evaluation for you and potentially suggest an efficient GPU assignment that suits your system and workload. You can read more about NVTAGS in this blog post.
NVTAGS follows a two step process to identify and apply efficient GPU assignments.
General outline of NVTAGS tune/run/run-bind commands:
NVTAGS tune mode:
nvtags tune "MPI app run cmd" [options]
NVTAGS run mode:
nvtags run --run-cmd "MPI app run cmd" [options]
NVTAGS run-bind node:
nvtags run-bind --run-cmd "app run cmd" --num-procs {$GPU_COUNT} [options]
Set the resource path for Tunecache files
export QUDA_RESOURCE_PATH=$PWD
export GPU_COUNT=2
singularity run --nv chroma.sif mpirun --allow-run-as-root -x ${QUDA_RESOURCE_PATH} -n ${GPU_COUNT} chroma -i ./test.ini.xml -geom 1 1 1 ${GPU_COUNT} -ptxdb ./qdpdb -gpudirect
There is currently a bug in Singularity 3.1.x and 3.2.x causing the LD_LIBRARY_PATH
to be incorrectly set within the container environment.
As a workaround The LD_LIBRARY_PATH
must be unset before invoking Singularity:
$ LD_LIBRARY_PATH="" singularity exec ...
“Lattice Quantum Chromodynamics and Chroma”, B. Joo, R. G. Edwards, F. T. Winter in the book "Exascale Scientific Applications: Scalability and Performance Portability” Tjerk P. Straatsma, Katerina B. Antypas, Timothy J. Williams (editors), Chapman & Hall/CRC Computational Science Series, CRC Press, Chapter 16"
“Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics”, R. Babich, M. A. Clark, B. Joo, Proceedings of SC’10, The International Conference on High Performance Computing, Networking, Storage and Analysis, 2010, New Orleans, USA
“Scaling Lattice QCD beyond 100 GPUs”, R. Babich, M. A. Clark, B. Joó, G. Shi, R. C. Brower, S. Gottlieb, Proceedings of SC’11, SC’11, The International Conference on High Performance Computing, Networking, Storage and Analysis, 2011, Seattle , USA
“Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization”, M. A. Clark, B. Joo, A. Strelchenko, M. Cheng, A. Gambhir, R. Brower, SC’16,The International Conference on High Performance Computing, Networking, Storage and Analysis, 2016, Salt Lake City, USA
“Lattice QCD on GPU clusters, using the QUDA library and the Chroma software system”, B. Joo, M. A. Clark, The International Journal of High Performance Computing Applications, Volume: 26 issue: 4, page(s): 386-398
“A Framework for Lattice QCD Calculations on GPUs” F. T. Winter, M. A. Clark. R. G. Edwards, B. Joo, 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014) Phoenix, USA, May 19-23, 2014
“The Chroma Software System for Lattice QCD”, R. G. Edwards, B. Joo, Proceedings of the XXIInd International Symposium on Lattice Field Theory (LATTICE2004)
Nuclear Physics B. (Proc. Suppl.) Volume 140, March 2005, Pages 832-834
“Solving Lattice QCD systems of equations using mixed precision solvers on GPUs”, M. A. Clark, R. Babich, K. Barros, R. C. Brower, C. Rebbi, Comput. Phys. Commun. 181 (2010), 1517-1528
NVIDIA Topology-Aware GPU Selection (NVTAGS)
The primary authors of QDP-JIT and Chroma are
Chroma is registered on DOE CODE with DOI: 10.11578/dc.20180208.2. The primary author of QUDA is Kate Clark and QUDA is developed by the QUDA Community.
Funding Acknowledgement: Development for Chroma and the USQCD software stack which underpins it is funded by the U.S. Department of Energy, Office of Science, Offices of Nuclear Physics, High Energy Physics and Advanced Scientific Computing Research under the SciDAC, SciDAC-2, SciDAC-3, SciDAC-4 programs and the Exascale Computing Project. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under contract DE-AC05-06OR23177 under which JSA LLC manages and operates the Thomas Jefferson National Accelerator Facility (also known as Jefferson Lab).
Chroma and its submodules are distributed under a 3-part BSD-like LICENSE. The software contains the work of several collaborating institutions and individuals and the institutions and individuals retain their respective copyrights. This build of Chroma also utilizes the following open source software: QUDA, Eigen (through QUDA), the LLVM Compiler Infrastructure (in QDP-JIT), and LibXML2. Some of our testing code utilizes the GoogleTest framework. The respective licensing and Copyright files can be found here. The names of the authors and of Thomas Jefferson National Accelerator Facility (also known as Jefferson Lab) may not be used to endorse or promote products derived from this software without specific prior written permission.