High-performance computing (HPC) is the art and science of using groups of cutting edge computer systems to perform complex simulations, computations, and data analysis out of reach for standard commercial compute systems available.
HPC computer systems are characterized by their high-speed processing power, high-performance networks, and large-memory capacity, generating the capability to perform massive amounts of parallel processing. A supercomputer is a type of HPC computer that is highly advanced and provides immense computational power and speed, making it a key component of high-performance computing systems.
In recent years, HPC has evolved from a tool focused on simulation-based scientific investigation to a dual role running simulation and machine learning (ML). This increase in scope for HPC systems has gained momentum because the combination of physics-based simulation and ML has compressed the time to scientific insight for fields such as climate modeling, drug discovery, protein folding, and computational fluid dynamics (CFD).
The basic system architecture of a supercomputer.
One key enabler driving this evolution of HPC and ML is the development of graphics processing unit (GPU) technology. GPUs are specialized computer chips designed to process large amounts of data in parallel, making them ideal for some HPC, and are currently the standard for ML/AI computations. The combination of high-performance GPUs with software optimizations has enabled HPC systems to perform complex simulations and computations much faster than traditional computing systems.
High-performance computing is important for several reasons:
HPC has revolutionized the way research and engineering are conducted and has had a profound impact on many aspects of our lives, from improving the efficiency of industrial processes to disaster response and mitigation to furthering our understanding of the world around us.
High-performance computing works by combining the computational power of multiple computers to perform large-scale tasks that would be infeasible on a single machine. Here is how HPC works:
By harnessing the collective power of many computers, HPC enables large-scale simulations, data analysis, and other compute-intensive tasks to be completed in a fraction of the time it would take on a single machine.
A high-performance computing cluster is a collection of tightly interconnected computers that work in parallel as a single system to perform large-scale computational tasks. HPC clusters are designed to provide high performance and scalability, enabling scientists, engineers, and researchers to solve complex problems that would be infeasible with a single computer.
An HPC cluster typically consists of many individual computing nodes, each equipped with one or more processors, accelerators, memory, and storage. These nodes are connected by a high-performance network, allowing them to share information and collaborate on tasks. In addition, the cluster typically includes specialized software and tools for managing resources, such as scheduling jobs, distributing data, and monitoring performance. Application speedups are accomplished by partitioning data and distributing tasks to perform the work in parallel.
Training data speedup using traditional HPC.
Source: Adapted from graph data presented in Convergence of Artificial Intelligence and High-Performance Computing on NSF-Supported Cyberinfrastructure | Journal of Big Data | Full Text (springeropen.com)
Climate models are used to simulate the behavior of the Earth's climate, including the atmosphere, oceans, and land surfaces. These simulations can be computationally intensive and require large amounts of data and parallel computing, making them ideal for GPU-accelerated HPC systems. By using GPUs and other parallel processing techniques, climate scientists can run more detailed and accurate simulations, which in turn lead to a better understanding of the Earth's climate and the impacts of human activities. As this use case continues to progress, the predictive capabilities will grow and can be used to design effective mitigation and adaptation strategies.
The discovery and development of new drugs is a complex process that involves the simulation of millions of chemical compounds to identify those that have the potential to treat diseases. Traditional methods of drug discovery have been limited by insufficient computational power, but HPC and GPU technology allow scientists to run more detailed simulations and deploy more effective AI algorithms, resulting in the discovery of new drugs at a faster pace.
Protein folding refers to the process by which proteins fold into three-dimensional structures, which are critical to their function. Understanding protein folding is critical to the development of treatments for diseases such as Alzheimer's and cancer. HPC and GPU technology are enabling scientists to run protein-folding simulations more efficiently, leading to a better understanding of the process and accelerating the development of new treatments
Computational fluid dynamics (CFD) simulations are used to model the behavior of fluids in real-world systems, such as the flow of air around an aircraft. HPC and GPU technology let engineers run more detailed and accurate CFD simulations, which help improve the designs for systems such as wind turbines, jet engines, and transportation vehicles of all types.
HPC and ML/Al are having a significant impact on climate modeling, which is used to simulate the behavior of the Earth.
Some of the most used high-performance computing applications in science and engineering include:
There are many computer codes used for molecular dynamics (MD) simulations, but some of the most frequently used ones are:
There are several computer codes used for CFD simulations, but some of the most used ones are:
There are many computer codes used for climate modeling, but some of the most used ones are:
There are several computer codes used for computational chemistry, but some of the most used ones are:
There are many computer codes used for machine learning, but some of the most used ones are:
These codes provide a wide range of ML algorithms, including supervised and unsupervised learning, deep learning, and reinforcement learning. They’re widely used for tasks such as image and speech recognition, natural language processing, and predictive analytics, and they’re essential tools for solving complex problems in areas such as computer vision, robotics, and finance.
Here are some ways you get started in high-performance computing: