subscribe to arXiv mailings

FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure

Authors: Yashael Faith Arthanto, David Ojika, Joo-Young Kim

Abstract: By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the explorati… ▽ More By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5$\times$. It records 0.35$us$ and 0.59$us$ latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94$\times$ and 1.98$\times$ speedup for matrix multiplication and convolution operation, respectively, showing its scalability potential for HPC infrastructure. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: This paper will be published in the 2022 32nd International Conference on Field Programmable Logic and Applications (FPL)

arXiv:2109.04067 [pdf]

Towards Sustainable Energy-Efficient Data Centers in Africa

Authors: David Ojika, Jayson Strayer, Gaurav Kaul

Abstract: Developing nations are particularly susceptible to the adverse effects of global warming. By 2040, 14 percent of global emissions will come from data centers. This paper presents early findings in the use AI and digital twins to model and optimize data center operations. Developing nations are particularly susceptible to the adverse effects of global warming. By 2040, 14 percent of global emissions will come from data centers. This paper presents early findings in the use AI and digital twins to model and optimize data center operations. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: Presented at OCP Future Technologies Symposium, San Jose, California | Nov 8, 2021

arXiv:2003.08732 [pdf]

Addressing the Memory Bottleneck in AI Model Training

Authors: David Ojika, Bhavesh Patel, G. Anthony Reina, Trent Boyer, Chad Martin, Prashant Shah

Abstract: Using medical imaging as case-study, we demonstrate how Intel-optimized TensorFlow on an x86-based server equipped with 2nd Generation Intel Xeon Scalable Processors with large system memory allows for the training of memory-intensive AI/deep-learning models in a scale-up server configuration. We believe our work represents the first training of a deep neural network having large memory footprint… ▽ More Using medical imaging as case-study, we demonstrate how Intel-optimized TensorFlow on an x86-based server equipped with 2nd Generation Intel Xeon Scalable Processors with large system memory allows for the training of memory-intensive AI/deep-learning models in a scale-up server configuration. We believe our work represents the first training of a deep neural network having large memory footprint (~ 1 TB) on a single-node server. We recommend this configuration to scientists and researchers who wish to develop large, state-of-the-art AI models but are currently limited by memory. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: Presented at Workshop on MLOps Systems at MLSys 2020 Conference, Austin TX

Showing 1–3 of 3 results for author: Ojika, D