research-article

Public Access

Ookami: Deployment and Initial Experiences

Authors:

Andrew Burford,

Barbara Chapman,

Catherine Feldman,

Robert Harrison,

Benjamin Michalowicz,

Nikolay Simakov,

Dossay OryspayevAuthors Info & Claims

PEARC '21: Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions

Article No.: 9, Pages 1 - 8

https://doi.org/10.1145/3437359.3465578

Published: 17 July 2021 Publication History

All formats PDF

Abstract

Ookami [3] is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu [17] in collaboration with RIKΞN [35, 37] for the Japanese path to exascale computing, as deployed in Fugaku [36], the fastest computer in the world [34]. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem.

References

[1]

Y Ajima. 2019. The Tofu Interconnect D for the Supercomputer Fugaku. https://www.fujitsu.com/global/Images/the-tofu-interconnect-d-for-supercomputer-fugaku.pdf

[2]

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71

Digital Library

[3]

IACS Stony Brook. 2020. Ookami. https://www.stonybrook.edu/commcms/ookami/

[4]

Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. 1–3.

Digital Library

[5]

J Dongarra, P Luszczek, and Y Tsai. 2021. HPL-AI benchmark. https://icl.bitbucket.io/hpl-ai/

[6]

J. Dongarra, S. Otto, M. Snir, and D. Walker. 1995. An Introduction to the MPI Standard.

[7]

A.C. Calder et al.2002. On Validating an Astrophysical Simulation Code. The Astrophysical Journal Supplement Series 143 (2002), 201–229.

[8]

A. Dubey et al.2013. The Software development process of FLASH, a multiphysics simulation code. In 2013 5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE). 1–8. https://doi.org/10.1109/SECSE.2013.6615093

[9]

J. Palmer et al.2015. Open XDMoD: A tool for the comprehensive management of high-performance computing resources. Computing in Science and Engineering 17, 4 (2015), 52–62. https://doi.org/10.1109/MCSE.2015.68

Digital Library

[10]

M. Kromer et al.2015. Deflagrations in hybrid CONe white dwarfs: a route to explain the faint Type Iax supernova 2008ha. Monthly Notices of the Royal Astronomical Society 450, 3 (05 2015), 3045–3053. https://doi.org/10.1093/mnras/stv886 arXiv:https://academic.oup.com/mnras/article-pdf/450/3/3045/18513746/stv886.pdf

[11]

P. Luszczek et al.2006. The HPC Challenge (HPCC) Benchmark Suite. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (Tampa, Florida) (SC ’06). Association for Computing Machinery, New York, NY, USA, 213–es. https://doi.org/10.1145/1188455.1188677

Digital Library

[12]

R.J. Foley et al.2013. TYPE Iax SUPERNOVAE: A NEW CLASS OF STELLAR EXPLOSION. The Astrophysical Journal 767, 1 (mar 2013), 57. https://doi.org/10.1088/0004-637x/767/1/57

[13]

Charles R. Ferenbaugh. [n.d.]. PENNANT: An Unstructured Mesh Mini-App for Advanced Architecture Research. https://www.osti.gov/biblio/1079561-pennant-unstructured-mesh-mini-app-advanced-architecture-research

[14]

MPI Standardization Forum. 1994-2021. MPI Standardization Forum Website. https://www.mpi-forum.org/

[15]

B. Fryxell, K. Olson, P. Ricker, F. X. Timmes, M. Zingale, D. Q. Lamb, P. MacNeice, R. Rosner, J. W. Truran, and H. Tufo. 2000. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes. The Astrophysical Journal Supplement Series 131 (2000), 273–334.

[16]

Fujitsu. 2019. Fujitsu Green-500 award. https://bit.ly/382Ls9Y

[17]

Fujitsu. 2021. A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/

[18]

Inria. 2021. KNEM Website. https://knem.gitlabpages.inria.fr/

[19]

C. Kutzner, S. Páll, M. Fechner, A. Esztermann, B.L. de Groot, and H. Grubmüller. 2019. More bang for your buck: Improved use of GPU nodes for GROMACS 2018. Journal of Computational Chemistry 40, 27 (2019), 2418–2431. https://doi.org/10.1002/jcc.26011 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.26011

[20]

Sandia National Laboratories. 2018. Astra, ARM-based supercomputer. https://www.scientific-computing.com/news/sandia-labs-supercomputer-fastest-arm-based-top500-system

[21]

Oak Ridge National Laboratory. 2020-2021. ORNL Wombat Cluster. https://www.olcf.ornl.gov/olcf-resources/compute-systems/wombat/

[22]

Lindahl, Abraham, Hess, and van der Spoel. 2021. GROMACS 2021 Source code. https://doi.org/10.5281/zenodo.4457626

[23]

P. MacNeice, C. Olson, K. M. Mobarry, R. de Fainchtein, and C. Packer. 1999. PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit. NASA Tech. Rep. CR-1999-209483(1999).

[24]

S. McIntosh-Smith, J. Price, T. Deakin, and A. Poenaru. 2019. A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience 31, 16(2019), e5110.

[25]

J. Meng, A. Atle, H. Calandra, and M. Araya-Polo. 2020. Minimod: A Finite Difference solver for Seismic Modeling. arXiv (2020). arxiv:2007.06048 [cs.DC] https://arxiv.org/abs/2007.06048

[26]

MVAPICH2. 2001-2021. MVAPICH2 Website. https://mvapich.cse.ohio-state.edu/

[27]

Open-MPI. 2004-2021. Open-MPI Website. https://www.open-mpi.org/

[28]

OpenSHMEM. 2011-2021. OpenSHMEM Website. http://www.openshmem.org/

[29]

OpenUCX. [n.d.]. OpenUCX Website. https://www.openucx.org/

[30]

A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary. [n.d.]. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://netlib.org/benchmark/hpl.

[31]

E. Raut, J. Anderson, M. Araya-Polo, and J. Meng. 2021. Porting and Evaluation of a Distributed Task-driven Stencil-based Application. In Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores (Virtual Event, Republic of Korea) (PMAM ’21). Association for Computing Machinery, New York, NY, USA, 10 pages. https://doi.org/10.1145/3448290.3448559

Digital Library

[32]

E. Raut, J. Meng, M. Araya-Polo, and B. Chapman. 2020. Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application. In OpenMP: Portable Multi-Level Parallelism on Modern Systems, K. Milfeld, B.R. de Supinski, L. Koesterke, and J. Klinkenberg (Eds.). Springer International Publishing, Cham, 67–81.

[33]

RIKEN. 2020. Fugagku Graph-500. https://www.r-ccs.riken.jp/en/award/20201117_graph500

[34]

RIKEN. 2020. Fugaku performance award. https://www.riken.jp/en/news_pubs/news/2020/20200623_1/

[35]

RIKEN. 2021. Center for Computational Science. https://www.r-ccs.riken.jp/en/

[36]

RIKEN. 2021. Fugaku supercomputer. https://www.r-ccs.riken.jp/en/fugaku

[37]

RIKEN. 2021. RIKEN. https://www.riken.jp/en/

[38]

M. Sato. 2019. Overview of the Post-K processor. http://www.jicfus.jp/jp/wp-content/uploads/2018/11/msato-190109.pdf

[39]

Silicon Graphics Inc, Aconex, and Red Hat. 2000. Performance Co-Pilot (PCP). https://pcp.io.

[40]

Nikolay A. Simakov, Joseph P. White, Robert L. DeLeon, Steven M. Gallo, Matthew D. Jones, Jeffrey T. Palmer, Benjamin Plessinger, and Thomas R. Furlani. 2018. A Workload Analysis of NSF’s Innovative HPC Resources Using XDMoD. arxiv:1801.04306 [cs.DC]

[41]

SPEC. [n.d.]. SWIM benchmark page. https://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html

[42]

HPCG team. 2021. HPCG benchmark. https://www.hpcg-benchmark.org/

[43]

Top500ȯrg. 2020. Fugaku Top-500 award. https://bit.ly/2RWivXo

[44]

Ohio State University. 2001-2021. Ohio State University Micro-Benchmarks. https://mvapich.cse.ohio-state.edu/benchmarks/

[45]

XPMEM. [n.d.]. XPMEM Website. https://github.com/hjelmn/xpmem

Cited By

Ristow Hadlich RVerma GCurtis TSiegmann EAssanis D(2024)A64FX Enables Engine Decarbonization Using Deep LearningPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670619(1-5)Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1145/3626203.3670619
Calder ASiegmann EFeldman CChheda SSmolarski DSwesty FCurtis ADey JCarlson DMichalowicz BHarrison R(2024)Ookami: An A64FX Computing ResourceJournal of Physics: Conference Series10.1088/1742-6596/2742/1/0120192742:1(012019)Online publication date: 1-Apr-2024
https://doi.org/10.1088/1742-6596/2742/1/012019
Böther MBenson LKlimovic ARabl T(2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.14778/3611479.3611485
Show More Cited By

Recommendations

Experiences with implementing Kokkos’ SYCL backend
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCL

With the recent diversification of the hardware landscape in the high-performance computing community, performance-portability solutions are becoming more and more important. One of the most popular choices is Kokkos. In this paper, we describe how ...
Toward performance-portable PETSc for GPU-based exascale systems
Abstract
The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization. The PETSc design for performance ...
Visualization at exascale: Making it all work with VTK-m

The VTK-m software library enables scientific visualization on exascale-class supercomputers. Exascale machines are particularly challenging for software development in part because they use GPU accelerators to provide the vast majority of their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PEARC '21: Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions

July 2021

310 pages

ISBN:9781450382922

DOI:10.1145/3437359

Editors:
Joseph Paris
Northwestern University
,
Jackie Milhans
Northwestern University
,
Betsy Hillery
Purdue University
,
Sharon Broude Geva
University of Michigan
,
Patrick Schmitz
Semper Cogito
,
Robert Sinkovits
San Diego Supercomputer Center

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

PEARC '21

Sponsor:

PEARC '21: Practice and Experience in Advanced Research Computing

July 18 - 22, 2021

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)12

Reflects downloads up to 23 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ristow Hadlich RVerma GCurtis TSiegmann EAssanis D(2024)A64FX Enables Engine Decarbonization Using Deep LearningPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670619(1-5)Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1145/3626203.3670619
Calder ASiegmann EFeldman CChheda SSmolarski DSwesty FCurtis ADey JCarlson DMichalowicz BHarrison R(2024)Ookami: An A64FX Computing ResourceJournal of Physics: Conference Series10.1088/1742-6596/2742/1/0120192742:1(012019)Online publication date: 1-Apr-2024
https://doi.org/10.1088/1742-6596/2742/1/012019
Böther MBenson LKlimovic ARabl T(2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.14778/3611479.3611485
Feldman CChheda SCalder ASiegmann EDey JCurtis THarrison R(2023)A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation CodePractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597583(186-195)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.1145/3569951.3597583
Song CMerwade VWang SWitt MKumar VIrwin EZhao LWalton A(2023)Cyberinfrastructure for sustainability sciencesEnvironmental Research Letters10.1088/1748-9326/acd9dd18:7(075002)Online publication date: 7-Jul-2023
https://doi.org/10.1088/1748-9326/acd9dd
Mishra AMalik ALin MChapman B(2023)OpenMP Advisor: A Compiler Tool for Heterogeneous ArchitecturesOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_3(34-48)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-40744-4_3
Кузьминский М(2022)Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarksProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2022-13-1-63-12913:1(63-129)Online publication date: 2022
https://doi.org/10.25209/2079-3316-2022-13-1-63-129
Kuzminsky M(2022)Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarksProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2022-13-1-131-19413:1(131-194)Online publication date: 2022
https://doi.org/10.25209/2079-3316-2022-13-1-131-194
Feldman CMichalowicz BSiegmann ECurtis TCalder AHarrison R(2022)Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX PlatformInternational Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3503470.3503478(72-77)Online publication date: 11-Jan-2022
https://dl.acm.org/doi/10.1145/3503470.3503478
Ahmad ZJavanmard MCroisdale GGregory AGanapathi PPouchet LChowdhury R(2022)FOURST: A code generator for FFT-based fast stencil computations2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00010(99-108)Online publication date: May-2022
https://doi.org/10.1109/ISPASS55109.2022.00010

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents