skip to main content
short-paper

Voyager – An Innovative Computational Resource for Artificial Intelligence & Machine Learning Applications in Science and Engineering

Published: 10 September 2023 Publication History

Abstract

Voyager is an innovative computational resource designed by the San Diego Supercomputer Center in collaboration with technology partners to accelerate the development and performance of artificial intelligence and machine learning applications in science and engineering. Based on Intel’s Habana Labs first-generation deep learning (Gaudi) training and (Goya) inference processors, Voyager is funded by the National Science Foundation’s Advanced Computing Systems & Services Program as a Category II system and will be operated for 5 years, starting with an initial 3-year exploratory test-bed phase that will be followed by a 2-year allocated production phase for the national research community. Its AI-focused hardware features several innovative components, including fully-programmable tensor processing cores, high-bandwidth memory, and integrated, on-chip RDMA over Converged Ethernet network interfaces. In addition, Habana’s SynapseAI software suite provides seamless integration to popular machine learning frameworks like PyTorch and TensorFlow for end users. Here, we describe the design motivation for Voyager, its system architecture, software and user environment, initial benchmarking results, and the early science use cases and applications currently being ported to and deployed on the system.

References

[1]
2023. Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS). https://access-ci.org
[2]
2023. DeepSpeed. https://www.deepspeed.ai
[3]
2023. Habana Gaudi Documentation. https://docs.habana.ai/en/latest
[4]
2023. Laion2B-en. https://huggingface.co/datasets/laion/laion2B-en
[5]
2023. Training Causal Language Models on SDSC’s Gaudi-based Voyager Supercomputing Cluster. https://developer.habana.ai/blog/training-causal-language-models-on-sdscs-gaudi-based-voyager-supercomputing-cluster/
[6]
Kaimin He et al.2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90
[7]
Olga Russakovsky et al.2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
[8]
Peter Mattson et al.2020. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]
[9]
Intel Habana Labs. 2020. Habana Deep Learning Examples for Training and Inference. Available at https://github.com/HabanaAI/Model-References.
[10]
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arxiv:1910.02054 [cs.LG]
[11]
Baidu Research. 2016. DeepBench: Benchmarking Deep Learning operations on different hardware. Available at https://github.com/baidu-research/DeepBench.
[12]
Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2022. AI and ML Accelerator Survey and Trends. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). 1–10. https://doi.org/10.1109/HPEC55821.2022.9926331
[13]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

Cited By

View all
  • (2024)Evaluation of Kubernetes Schedulers for a Community Cloud Computing ModelPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670520(1-7)Online publication date: 17-Jul-2024
  • (2024)Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectorsCommunications Physics10.1038/s42005-024-01599-57:1Online publication date: 10-Apr-2024

Index Terms

  1. Voyager – An Innovative Computational Resource for Artificial Intelligence & Machine Learning Applications in Science and Engineering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good
    July 2023
    519 pages
    ISBN:9781450399852
    DOI:10.1145/3569951
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. AI-focused hardware
    2. benchmarking
    3. deep learning
    4. scientific applications
    5. system deployment

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    PEARC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 133 of 202 submissions, 66%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)118
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 22 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluation of Kubernetes Schedulers for a Community Cloud Computing ModelPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670520(1-7)Online publication date: 17-Jul-2024
    • (2024)Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectorsCommunications Physics10.1038/s42005-024-01599-57:1Online publication date: 10-Apr-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media