Voyager is an innovative computational resource designed by the San Diego Supercomputer Center in collaboration with technology partners to accelerate the development and performance of artificial intelligence and machine learning applications in science and engineering. Based on Intel’s Habana Labs first-generation deep learning (Gaudi) training and (Goya) inference processors, Voyager is funded by the National Science Foundation’s Advanced Computing Systems & Services Program as a Category II system and will be operated for 5 years, starting with an initial 3-year exploratory test-bed phase that will be followed by a 2-year allocated production phase for the national research community. Its AI-focused hardware features several innovative components, including fully-programmable tensor processing cores, high-bandwidth memory, and integrated, on-chip RDMA over Converged Ethernet network interfaces. In addition, Habana’s SynapseAI software suite provides seamless integration to popular machine learning frameworks like PyTorch and TensorFlow for end users. Here, we describe the design motivation for Voyager, its system architecture, software and user environment, initial benchmarking results, and the early science use cases and applications currently being ported to and deployed on the system.

References

[1]

2023. Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS). https://access-ci.org

Google Scholar

[2]

2023. DeepSpeed. https://www.deepspeed.ai

Google Scholar

[3]

2023. Habana Gaudi Documentation. https://docs.habana.ai/en/latest

Google Scholar

[4]

2023. Laion2B-en. https://huggingface.co/datasets/laion/laion2B-en

Google Scholar

[5]

2023. Training Causal Language Models on SDSC’s Gaudi-based Voyager Supercomputing Cluster. https://developer.habana.ai/blog/training-causal-language-models-on-sdscs-gaudi-based-voyager-supercomputing-cluster/

Google Scholar

[6]

Kaimin He et al.2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90

Crossref

Google Scholar

[7]

Olga Russakovsky et al.2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Digital Library

Google Scholar

[8]

Peter Mattson et al.2020. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]

Google Scholar

[9]

Intel Habana Labs. 2020. Habana Deep Learning Examples for Training and Inference. Available at https://github.com/HabanaAI/Model-References.

Google Scholar

[10]

Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arxiv:1910.02054 [cs.LG]

Google Scholar

[11]

Baidu Research. 2016. DeepBench: Benchmarking Deep Learning operations on different hardware. Available at https://github.com/baidu-research/DeepBench.

Google Scholar

[12]

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2022. AI and ML Accelerator Survey and Trends. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). 1–10. https://doi.org/10.1109/HPEC55821.2022.9926331

Crossref

Google Scholar

[13]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

Google Scholar

Cited By

View all

Gough E(2024)Evaluation of Kubernetes Schedulers for a Community Cloud Computing ModelPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670520(1-7)Online publication date: 17-Jul-2024
https://doi.org/10.1145/3626203.3670520
Pata JWulff EMokhtar FSouthwick DZhang MGirone MDuarte J(2024)Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectorsCommunications Physics10.1038/s42005-024-01599-57:1Online publication date: 10-Apr-2024
https://doi.org/10.1038/s42005-024-01599-5

Index Terms

Voyager – An Innovative Computational Resource for Artificial Intelligence & Machine Learning Applications in Science and Engineering
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

Computational approaches to Explainable Artificial Intelligence: Advances in theory, applications and trends
Abstract
Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at ...
Highlights
- The most groundbreaking advances in theoretical and applied Artificial Intelligence.
- Deep Learning in real-world tasks, such as clinical diagnostics or robotics.
- Several applications are presented, reviewed and discussed.
- State-...
Review of artificial intelligence applications in engineering design perspective
Abstract
Having passed the primitive phases and starting to revolutionize many different fields in some way, artificial intelligence is on its way to becoming a disruptive technology. It is also foreseen to totally change human-centred ...
Language Artificial Intelligence: Patent-Thesis Analysis, Global Trend, Technical Strengths and Weaknesses of each country and company

Comments

Information & Contributors

Information

Published In

PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good

July 2023

519 pages

ISBN:9781450399852

DOI:10.1145/3569951

Editors:
Robert Sinkovits
San Diego Supercomputer Center
,
Alana Romanella
University of Colorado Boulder
,
Shelley Knuth
University of Colorado Boulder
,
Ken Hackworth
Pittsburgh Supercomputing Center
,
Jeff Pummill
University of Arkansas

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

PEARC '23

Sponsor:

PEARC '23: Practice and Experience in Advanced Research Computing

July 23 - 27, 2023

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
151
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)12

Reflects downloads up to 22 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gough E(2024)Evaluation of Kubernetes Schedulers for a Community Cloud Computing ModelPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670520(1-7)Online publication date: 17-Jul-2024
https://doi.org/10.1145/3626203.3670520
Pata JWulff EMokhtar FSouthwick DZhang MGirone MDuarte J(2024)Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectorsCommunications Physics10.1038/s42005-024-01599-57:1Online publication date: 10-Apr-2024
https://doi.org/10.1038/s42005-024-01599-5

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Computational approaches to Explainable Artificial Intelligence: Advances in theory, applications and trends

Review of artificial intelligence applications in engineering design perspective

Language Artificial Intelligence: Patent-Thesis Analysis, Global Trend, Technical Strengths and Weaknesses of each country and company

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations