skip to main content
research-article
Public Access

Ookami: Deployment and Initial Experiences

Published: 17 July 2021 Publication History

Abstract

Ookami [3] is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu [17] in collaboration with RIKΞN  [35, 37] for the Japanese path to exascale computing, as deployed in Fugaku [36], the fastest computer in the world [34]. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem.

References

[1]
Y Ajima. 2019. The Tofu Interconnect D for the Supercomputer Fugaku. https://www.fujitsu.com/global/Images/the-tofu-interconnect-d-for-supercomputer-fugaku.pdf
[2]
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. https://doi.org/10.1109/SC.2012.71
[3]
IACS Stony Brook. 2020. Ookami. https://www.stonybrook.edu/commcms/ookami/
[4]
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. 1–3.
[5]
J Dongarra, P Luszczek, and Y Tsai. 2021. HPL-AI benchmark. https://icl.bitbucket.io/hpl-ai/
[6]
J. Dongarra, S. Otto, M. Snir, and D. Walker. 1995. An Introduction to the MPI Standard.
[7]
A.C. Calder et al.2002. On Validating an Astrophysical Simulation Code. The Astrophysical Journal Supplement Series 143 (2002), 201–229.
[8]
A. Dubey et al.2013. The Software development process of FLASH, a multiphysics simulation code. In 2013 5th International Workshop on Software Engineering for Computational Science and Engineering (SE-CSE). 1–8. https://doi.org/10.1109/SECSE.2013.6615093
[9]
J. Palmer et al.2015. Open XDMoD: A tool for the comprehensive management of high-performance computing resources. Computing in Science and Engineering 17, 4 (2015), 52–62. https://doi.org/10.1109/MCSE.2015.68
[10]
M. Kromer et al.2015. Deflagrations in hybrid CONe white dwarfs: a route to explain the faint Type Iax supernova 2008ha. Monthly Notices of the Royal Astronomical Society 450, 3 (05 2015), 3045–3053. https://doi.org/10.1093/mnras/stv886 arXiv:https://academic.oup.com/mnras/article-pdf/450/3/3045/18513746/stv886.pdf
[11]
P. Luszczek et al.2006. The HPC Challenge (HPCC) Benchmark Suite. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (Tampa, Florida) (SC ’06). Association for Computing Machinery, New York, NY, USA, 213–es. https://doi.org/10.1145/1188455.1188677
[12]
R.J. Foley et al.2013. TYPE Iax SUPERNOVAE: A NEW CLASS OF STELLAR EXPLOSION. The Astrophysical Journal 767, 1 (mar 2013), 57. https://doi.org/10.1088/0004-637x/767/1/57
[13]
Charles R. Ferenbaugh. [n.d.]. PENNANT: An Unstructured Mesh Mini-App for Advanced Architecture Research. https://www.osti.gov/biblio/1079561-pennant-unstructured-mesh-mini-app-advanced-architecture-research
[14]
MPI Standardization Forum. 1994-2021. MPI Standardization Forum Website. https://www.mpi-forum.org/
[15]
B. Fryxell, K. Olson, P. Ricker, F. X. Timmes, M. Zingale, D. Q. Lamb, P. MacNeice, R. Rosner, J. W. Truran, and H. Tufo. 2000. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes. The Astrophysical Journal Supplement Series 131 (2000), 273–334.
[16]
Fujitsu. 2019. Fujitsu Green-500 award. https://bit.ly/382Ls9Y
[17]
Fujitsu. 2021. A64FX. https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
[18]
Inria. 2021. KNEM Website. https://knem.gitlabpages.inria.fr/
[19]
C. Kutzner, S. Páll, M. Fechner, A. Esztermann, B.L. de Groot, and H. Grubmüller. 2019. More bang for your buck: Improved use of GPU nodes for GROMACS 2018. Journal of Computational Chemistry 40, 27 (2019), 2418–2431. https://doi.org/10.1002/jcc.26011 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.26011
[20]
Sandia National Laboratories. 2018. Astra, ARM-based supercomputer. https://www.scientific-computing.com/news/sandia-labs-supercomputer-fastest-arm-based-top500-system
[21]
Oak Ridge National Laboratory. 2020-2021. ORNL Wombat Cluster. https://www.olcf.ornl.gov/olcf-resources/compute-systems/wombat/
[22]
Lindahl, Abraham, Hess, and van der Spoel. 2021. GROMACS 2021 Source code. https://doi.org/10.5281/zenodo.4457626
[23]
P. MacNeice, C. Olson, K. M. Mobarry, R. de Fainchtein, and C. Packer. 1999. PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit. NASA Tech. Rep. CR-1999-209483(1999).
[24]
S. McIntosh-Smith, J. Price, T. Deakin, and A. Poenaru. 2019. A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience 31, 16(2019), e5110.
[25]
J. Meng, A. Atle, H. Calandra, and M. Araya-Polo. 2020. Minimod: A Finite Difference solver for Seismic Modeling. arXiv (2020). arxiv:2007.06048 [cs.DC] https://arxiv.org/abs/2007.06048
[26]
MVAPICH2. 2001-2021. MVAPICH2 Website. https://mvapich.cse.ohio-state.edu/
[27]
Open-MPI. 2004-2021. Open-MPI Website. https://www.open-mpi.org/
[28]
OpenSHMEM. 2011-2021. OpenSHMEM Website. http://www.openshmem.org/
[29]
OpenUCX. [n.d.]. OpenUCX Website. https://www.openucx.org/
[30]
A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary. [n.d.]. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http://netlib.org/benchmark/hpl.
[31]
E. Raut, J. Anderson, M. Araya-Polo, and J. Meng. 2021. Porting and Evaluation of a Distributed Task-driven Stencil-based Application. In Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores (Virtual Event, Republic of Korea) (PMAM ’21). Association for Computing Machinery, New York, NY, USA, 10 pages. https://doi.org/10.1145/3448290.3448559
[32]
E. Raut, J. Meng, M. Araya-Polo, and B. Chapman. 2020. Evaluating Performance of OpenMP Tasks in a Seismic Stencil Application. In OpenMP: Portable Multi-Level Parallelism on Modern Systems, K. Milfeld, B.R. de Supinski, L. Koesterke, and J. Klinkenberg (Eds.). Springer International Publishing, Cham, 67–81.
[33]
RIKEN. 2020. Fugagku Graph-500. https://www.r-ccs.riken.jp/en/award/20201117_graph500
[34]
RIKEN. 2020. Fugaku performance award. https://www.riken.jp/en/news_pubs/news/2020/20200623_1/
[35]
RIKEN. 2021. Center for Computational Science. https://www.r-ccs.riken.jp/en/
[36]
RIKEN. 2021. Fugaku supercomputer. https://www.r-ccs.riken.jp/en/fugaku
[37]
RIKEN. 2021. RIKEN. https://www.riken.jp/en/
[38]
M. Sato. 2019. Overview of the Post-K processor. http://www.jicfus.jp/jp/wp-content/uploads/2018/11/msato-190109.pdf
[39]
Silicon Graphics Inc, Aconex, and Red Hat. 2000. Performance Co-Pilot (PCP). https://pcp.io.
[40]
Nikolay A. Simakov, Joseph P. White, Robert L. DeLeon, Steven M. Gallo, Matthew D. Jones, Jeffrey T. Palmer, Benjamin Plessinger, and Thomas R. Furlani. 2018. A Workload Analysis of NSF’s Innovative HPC Resources Using XDMoD. arxiv:1801.04306 [cs.DC]
[41]
SPEC. [n.d.]. SWIM benchmark page. https://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html
[42]
HPCG team. 2021. HPCG benchmark. https://www.hpcg-benchmark.org/
[43]
Top500ȯrg. 2020. Fugaku Top-500 award. https://bit.ly/2RWivXo
[44]
Ohio State University. 2001-2021. Ohio State University Micro-Benchmarks. https://mvapich.cse.ohio-state.edu/benchmarks/
[45]
XPMEM. [n.d.]. XPMEM Website. https://github.com/hjelmn/xpmem

Cited By

View all
  • (2024)A64FX Enables Engine Decarbonization Using Deep LearningPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670619(1-5)Online publication date: 17-Jul-2024
  • (2024)Ookami: An A64FX Computing ResourceJournal of Physics: Conference Series10.1088/1742-6596/2742/1/0120192742:1(012019)Online publication date: 1-Apr-2024
  • (2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 1-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '21: Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions
July 2021
310 pages
ISBN:9781450382922
DOI:10.1145/3437359
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computer systems
  2. exascale
  3. high-performance computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)12
Reflects downloads up to 23 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A64FX Enables Engine Decarbonization Using Deep LearningPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670619(1-5)Online publication date: 17-Jul-2024
  • (2024)Ookami: An A64FX Computing ResourceJournal of Physics: Conference Series10.1088/1742-6596/2742/1/0120192742:1(012019)Online publication date: 1-Apr-2024
  • (2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 1-Jul-2023
  • (2023)A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation CodePractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3597583(186-195)Online publication date: 23-Jul-2023
  • (2023)Cyberinfrastructure for sustainability sciencesEnvironmental Research Letters10.1088/1748-9326/acd9dd18:7(075002)Online publication date: 7-Jul-2023
  • (2023)OpenMP Advisor: A Compiler Tool for Heterogeneous ArchitecturesOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_3(34-48)Online publication date: 13-Sep-2023
  • (2022)Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarksProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2022-13-1-63-12913:1(63-129)Online publication date: 2022
  • (2022)Modern server ARM processors for supercomputers: A64FX and others. Initial data of benchmarksProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2022-13-1-131-19413:1(131-194)Online publication date: 2022
  • (2022)Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX PlatformInternational Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3503470.3503478(72-77)Online publication date: 11-Jan-2022
  • (2022)FOURST: A code generator for FFT-based fast stencil computations2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00010(99-108)Online publication date: May-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media