Edge Computing

Bringing AI-RAN to a Telco Near You

Image of the GB200 NVL2 superchip.

Inferencing for generative AI and AI agents will drive the need for AI compute infrastructure to be distributed from edge to central clouds. IDC predicts that “Business AI (consumer excluded) will contribute $19.9 trillion to the global economy and account for 3.5% of GDP by 2030.”

5G networks must also evolve to serve this new incoming AI traffic. At the same time, there is an opportunity for telcos to become the local AI compute infrastructure for hosting enterprise AI workloads, independent of network connectivity while meeting their data privacy and sovereignty requirements. This is where an accelerated computing infrastructure shines – with the ability to accelerate both Radio signal processing and AI workloads. And most importantly, the same compute infrastructure can be used to process AI and radio access network (RAN) services. This combination has been called AI-RAN by the telecoms industry

NVIDIA is introducing Aerial RAN Computer-1, the world’s first AI-RAN deployment platform, that can serve AI and RAN workloads concurrently, on a common accelerated infrastructure. 

Following the launch of the AI-RAN Innovation Center by T-Mobile, the Aerial RAN Computer-1 turns AI-RAN into reality with a deployable platform that telcos can adopt globally. It can be used in small, medium, or large configurations for deployment at cell sites, distributed or centralized sites, effectively turning the network into a multi-purpose infrastructure that serves voice, video, data, and AI traffic. 

This is a transformative solution that reimagines wireless networks for AI, with AI. It is also a huge opportunity for telcos to fuel the AI flywheel, leveraging their distributed network infrastructure, low latency, guaranteed quality of service, massive scale, and ability to preserve data privacy, security, and localization – all key requirements for AI inferencing and agentic AI applications.

AI-RAN, AI Aerial, and Aerial RAN Computer-1

AI-RAN is the technology framework to build multipurpose networks that are also AI-native. As telcos embrace AI-RAN, and move from the traditional single-purpose ASIC-based computing networks for RAN to new multi-purpose accelerated computing-based networks serving RAN and AI together, telcos can now participate in the new AI economy and can leverage AI to improve the efficiency of their networks. 

NVIDIA AI Aerial includes three computer systems to design, simulate, train, and deploy AI-RAN-based 5G and 6G wireless networks. Aerial RAN Computer-1 is the base foundation of NVIDIA AI Aerial and provides a commercial-grade deployment platform for AI-RAN.

Aerial RAN Computer-1 (Figure 1) offers a common scalable hardware foundation to run RAN and AI workloads including – software-defined 5G, Private 5G RAN from NVIDIA or other RAN software providers, containerized network functions, AI microservices from NVIDIA or partners or host internal and third-party generative AI applications. Aerial RAN Computer-1 is modular by design, enabling it to scale from D-RAN to C-RAN architectures covering rural to dense urban use cases.

NVIDIA CUDA-X Libraries are central to accelerated computing, providing speed, accuracy, and reliability in addition to improved efficiency. That means more work is done in the same power envelope. Most importantly, domain-specific libraries, including telecom-specific adaptations, are key to making Aerial RAN Computer-1 suited for telecom deployments. 

NVIDIA DOCA offers a suite of tools and libraries that can significantly boost the performance enhancements for telco workloads, including RDMA, PTP/timing synchronization, and Ethernet-based fronthaul (eCPRI), as well as AI workloads that are crucial for modern network infrastructure.

Collectively, the full stack enables scalable hardware, common software, and an open architecture to deliver a high-performance AI-RAN together with ecosystem partners.

Stack diagram shows components in the AI-RAN Orchestrator, containerized functions, accelerated libraries, the CUDA and DOCA operating systems, accelerated compute with GB200 NVL-2, and networking with Spectrum-X.
Figure 1. NVIDIA Aerial RAN Computer-1, as a part of the NVIDIA AI Aerial platform

Benefits of Aerial RAN Computer-1

With Aerial RAN Computer-1, wireless networks can turn into a massively distributed grid of AI and RAN data centers, unleashing new monetization avenues for telcos while paving the way for 6G with a software upgrade.

Benefits of Aerial RAN Computer-1 for telecom service providers include the following:

  • Monetize with AI and generative AI applications, AI inferencing at the edge, or with GPU-as-a-Service.
  • Increase utilization of infrastructure by 2-3x compared to single-purpose base stations that are typically only 30% utilized today. Use the same infrastructure to host internal generative AI workloads and other containerized network functions such as UPF and RIC.
  • Improve radio network performance through site-specific AI learning, with up to 2x gains possible in spectral efficiency. This means direct cost savings per Mhz of the acquired spectrum.
  • Deliver high-performance RAN and AI experiences for next-gen applications that intertwine AI into every interaction. Aerial RAN Computer-1 can service up to 170 Gb/s throughput in RAN-only mode and 25K tokens/sec in AI-only mode, or a combination of both with superior performance compared to traditional networks.

Building blocks of Aerial RAN Computer-1

The key hardware components of Aerial RAN Computer-1 include the following:

  • NVIDIA GB200 NVL2
  • NVIDIA Blackwell GPU
  • NVIDIA Grace CPU
  • NVLink2 C2C
  • Fifth-generation NVIDIA NVLink
  • Key-value caching
  • MGX reference architecture
  • Real-time mainstream LLM inference

NVIDIA GB200 NVL2 

The NVIDIA GB200 NVL2 platform (Figure 2) used in Aerial RAN Computer-1 revolutionizes data center and edge computing, offering unmatched performance for mainstream large language models (LLMs), vRAN, vector database searches, and data processing. 

Powered by two NVIDIA Blackwell GPUs and two NVIDIA Grace CPUs, the scale-out single-node architecture seamlessly integrates accelerated computing into existing infrastructure. 

This versatility enables a wide range of system designs and networking options, making the GB200 NVL2 platform an ideal choice for data centers, edge, and cell site locations seeking to harness the power of AI as well as wireless 5G connectivity. 

For instance, half of a GB200 server could be allocated to RAN tasks and the other half to AI processing through Multi-instance GPU (MIG) technology at a single cell site. For aggregated sites, a full GB200 server could be dedicated to RAN, with another used exclusively for AI. In a centralized deployment, a cluster of GB200 servers could be shared between RAN and AI workloads

NVIDIA Blackwell GPU

NVIDIA Blackwell is a revolutionary architecture that delivers improved performance, efficiency, and scale. NVIDIA Blackwell GPUs pack 208B transistors and are manufactured using a custom-built TSMC 4NP process. All NVIDIA Blackwell products feature two reticle-limited dies connected by a 10-TB/s chip-to-chip interconnect in a unified single GPU.

NVIDIA Grace CPU

The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, vRAN, cloud, and high-performance computing (HPC) applications. It provides outstanding performance and memory bandwidth with 2x the energy efficiency of today’s leading server processors.

NVLink2 C2C

The GB200 NVL2 platform uses NVLink-C2C for a groundbreaking 900 GB/s interconnect between each NVIDIA Grace CPU and NVIDIA Blackwell GPU. Combined with fifth-generation NVLink, this delivers a massive 1.4-TB coherent memory model, fueling accelerated AI and vRAN performance.

To fully harness the power of exascale computing and trillion-parameter AI models, every GPU in a server cluster must communicate seamlessly and swiftly. 

Fifth-generation NVLink is a high-performance interconnect to deliver accelerated performance from the GB200 NVL2 platform.

Key-value caching

Key-value (KV) caching improves LLM response speeds by storing conversation context and history. 

GB200 NVL2 optimizes KV caching through its fully coherent NVIDIA Grace GPU and NVIDIA Blackwell GPU memory connected by NVLink-C2C, 7x faster than PCIe. This enables LLMs to predict words faster than x86-based GPU implementations.

MGX reference architecture

MGX GB200 NVL2 is a 2:2 configuration with CPU C-Links and GPU NVLinks connected​.

HPM contains the following components:

  • NVIDIA Grace CPUs (2)
  • Connectors for GPU pucks and I/O cards
  • GPU modules populated in 2U AC Server (2)

Each pluggable GPU module contains the GPU, B2B connection, and NVLink connectors.

Diagram shows NVIDIA Grace CPUs, NVIDIA Blackwell GPUs, NVLink connections, and MGX form factor.
Figure 2. NVIDIA GB200 NVL2 platform layout
GPU Compute40 PFLOPS FP4 | 20 PFLOPS FP8/FP6
10x GH200
GPU MemoryUp to 384 GB
CPU144 Core ARMv9,
960 GB LPDDR5,
1.4x perf & 30% lower power than 2x SPR
CPU to GPU
NVLink C2C
Per GPU 900 GB/s bi-dir. & cache-coherent
GPU to GPU
NVLink
1,800 GB/s bi-dir., NVLink
Scale-OutSpectrum-X Ethernet or InfiniBand Connect-X or BlueField
OSSingle OS with unified address space covering 2 CPU + 2 GPU
System PowerFull System ~3,500W, configurable
ScheduleSample: Q4 2024
MP: Q1 2025
Table 1. GB200 NVL2 platform features

Real-time mainstream LLM inference

The GB200 NVL2 platform introduces massive coherent memory up to 1.3 TB shared between two NVIDIA Grace CPUs and two NVIDIA Blackwell GPUs. This shared memory is coupled with fifth-generation NVIDIA NVLink and high-speed, chip-to-chip (C2C) connections to deliver 5x faster real-time LLM inference performance for mainstream language models, such as Llama3-70B. 

With an input sequence length of 256, an output sequence length of 8000, FP4 precision, the GB200 NVL2 platform can produce up to 25K tokens/sec, which is 2.16B tokens/day. 

Figure 3 shows how GB200 NVL2 performs when supporting AI and RAN workloads.

Image shows how the GB200NVL2 performs when supporting AI and RAN. AI and RAN vary with time of day and cell site activity.
Figure 3. Compute utilization for RAN and AI in GB200 NVL2

Here’s what platform tenancy looks like for RAN and AI on the GB200 NVL2 platform:

  • Workload at 100% utilization
    • RAN: ~36x 100 MHz 64T64R
    • *Tokens: 25K tokens/sec
    • AI: ~$10/hr. | ~$90K/year
  • Workload at 50:50 split utilization
    • RAN: ~18x 100 MHz 64T64R
    • *Tokens: 12.5K tokens/sec
    • AI: ~$5/hr. | ~$45K/year

*Token AI workload: Llama-3-70B FP4 | Sequence lengths input 256 / output 8K

Supporting hardware for Aerial RAN Computer-1

NVIDIA BlueField-3 and NVIDIA networking Spectrum-X are the supporting hardware for Aerial RAN Computer-1.

NVIDIA BlueField-3

NVIDIA BlueField-3 DPUs enable real-time data transmission with precision 5G timing required for fronthaul eCPRI traffic. 

NVIDIA offers a full IEEE 1588v2 Precision Time Protocol (PTP) software solution. NVIDIA PTP software solutions are designed to meet the most demanding PTP profiles. NVIDIA BlueField-3 incorporates an integrated PTP hardware clock (PHC) that enables the device to achieve sub-20 nanosecond accuracy while offering timing-related functions, including time-triggered scheduling and time-based, software-defined networking (SDN) accelerations. 

This technology also enables software applications to transmit fronthaul, RAN-compatible data in high bandwidth. 

NVIDIA networking Spectrum-X 

The edge and data center networks play a crucial role in driving AI and wireless advancements and performance, serving as the backbone for distributed AI model inference, generative AI, and world-class vRAN performance. 

NVIDIA BlueField-3 DPUs enable efficient scalability across hundreds and thousands of NVIDIA Blackwell GPUs for optimal application performance. 

The NVIDIA Spectrum-X Ethernet platform is designed specifically to improve the performance and efficiency of Ethernet-based AI clouds and includes all the required functionality for 5G timing synchronization. It delivers 1.6x better AI networking performance compared to traditional Ethernet, along with consistent, predictable performance in multi-tenant environments.

When Aerial RAN Computer-1 is deployed in a rack configuration, the Spectrum-X Ethernet switch serves as a dual-purpose fabric. It handles both fronthaul and AI (east-west) traffic on the compute fabric, while also carrying backhaul or midhaul and AI (north-south) traffic on the converged fabric. The remote radio units terminate at the switch in compliance with the eCPRI protocol.

Software stacks on Aerial RAN Computer-1

The key software stacks on Aerial RAN Computer-1 include the following:

  • NVIDIA Aerial CUDA-Accelerated RAN
  • NVIDIA AI Enterprise and NVIDIA NIM
  • NVIDIA Cloud Functions

NVIDIA Aerial CUDA-Accelerated RAN

NVIDIA Aerial CUDA-Accelerated RAN is the primary NVIDIA-built RAN software for 5G and private 5G running on Aerial RAN Computer-1. 

It includes NVIDIA GPU-accelerated interoperable PHY and MAC layer libraries that can be easily modified and seamlessly extended with AI components. These hardened RAN software libraries can also be used by other software providers, telcos, cloud service providers (CSPs), and enterprises for building custom commercial-grade, software-defined 5G and future 6G radio access networks (RANs).

Aerial CUDA-Accelerated RAN is integrated with NVIDIA Aerial AI Radio Frameworks, which provides a package of AI enhancements to enable training and inference in the RAN using the framework tools—pyAerial, NVIDIA Aerial Data Lake, and NVIDIA Sionna

It is also complemented by NVIDIA Aerial Omniverse Digital Twin, a system-level network digital twin development platform that enables physically accurate simulations of wireless systems.

NVIDIA AI Enterprise and NVIDIA NIM

NVIDIA AI Enterprise is the software platform for enterprise generative AI. NVIDIA NIM is a collection of microservices that simplify the deployment of foundation models for generative AI applications. 

Collectively, they provide easy-to-use microservices and blueprints that accelerate data science pipelines and streamline the development and deployment of production-grade co-pilots and other generative AI applications for enterprises. 

Enterprises and telcos can either subscribe to the managed NVIDIA Elastic NIM service or deploy and manage NIM themselves. Aerial RAN Computer-1 can host NVIDIA AI Enterprise and NIM-based AI and generative AI workloads. 

NVIDIA Cloud Functions

NVIDIA Cloud Functions offers a serverless platform for GPU-accelerated AI workloads, ensuring security, scalability, and reliability. It supports various communication protocols:

  • HTTP polling
  • Streaming
  • gRPC

Cloud Functions is primarily suited for shorter running, preemptable workloads, such as inferencing and fine-tuning. This is well-suited for the Aerial RAN Computer-1 platform as the RAN workload resource utilization varies over time of the day. 

The AI workloads that are ephemeral and preemptable can usually fill up those under-used hours of the day, which maintains high utilization of the Aerial RAN Computer-1 platform.

Deployment options and performance 

Aerial RAN Computer-1 has multiple deployment options that include all points in the radio access network: 

  • Radio base station cell site
  • Point of presence locations
  • Mobile switching offices
  • Baseband hotels 

For private 5G, it can be located on the enterprise premises. 

Aerial RAN Computer-1 can support various configurations and locations, including private, public, or hybrid cloud environments while using the same software regardless of location or interface standard. This ability offers unprecedented flexibility compared to traditional single-purpose RAN computers. 

The solution also supports a wide range of network technologies:

  • Open Radio Access Network (Open-RAN) architectures
  • AI-RAN
  • 3GPP standards
  • Other industry-leading specifications

Aerial RAN Computer-1, based on GB200, delivers continued performance improvements in RAN processing, AI processing, and energy efficiency compared to the earlier NVIDIA H100 and NVIDIA H200 GPUs (Figure 4).

The GB200 NVL2 platform provides a single MGX server for existing infrastructure, which is easy to deploy and scale out. You get mainstream LLM inference and data processing with high-end RAN compute.

Six boxes with the following text: Data processing: 18x; Vector database search: 9x; Llama-3 inference: 5x; RAN processing: 4x; RAN performance/cost: 1.7x; RAN performance/Watt: 2.4x.
Figure 4. GB200 NVL2 performance compared to previous generations

Conclusion

AI-RAN will revolutionize the telecom industry, enabling telcos to unlock new revenue streams and deliver enhanced experiences through generative AI, robotics, and autonomous technologies. The NVIDIA AI Aerial platform implements AI-RAN, aligning it with NVIDIA’s broader vision to make wireless networks AI-native. 

With Aerial RAN Computer-1, telcos can deploy AI-RAN on a common infrastructure today. You can maximize the utilization by running RAN and AI workloads concurrently and improve RAN performance with AI algorithms. 

Most importantly, with this common computer, you can tap into a completely new opportunity to become the AI fabric of choice for enterprises that need local computing and data sovereignty for their AI workloads. You can start with an AI-first approach and RAN next, with a software upgrade, starting the clock on maximizing ROI from day one.

T-Mobile and SoftBank have already announced their plans to commercialize AI-RAN together with leading RAN software providers, using hardware and software components of NVIDIA AI Aerial. 

At Mobile World Congress, Americas, Vapor IO and the City of Las Vegas announced the world’s first private 5G AI-RAN deployment using NVIDIA AI Aerial.

We are at a turning point in transforming wireless networks for AI, with AI. Join us at the NVIDIA AI Summit in Washington, D.C. and at the NVIDIA 6G Developer Day to learn more about NVIDIA Aerial AI and NVIDIA Aerial RAN Computer-1.

Discuss (0)

Tags