Nvidia Rubin: Double the Network Bandwidth

by priyanka.patel tech editor

Nvidia’s vera Rubin Architecture Promises 10x Performance gains with Radical Co-Design

Nvidia unveiled its groundbreaking Vera Rubin architecture this week at the Consumer Electronics Show in Las Vegas,signaling a major shift in the landscape of artificial intelligence computing. The new platform, slated for release later this year, aims to dramatically reduce the cost and complexity of AI workloads, offering up to a ten-fold reduction in inference costs and a four-fold decrease in the GPU count needed for certain training tasks compared to Nvidia’s existing Blackwell architecture.

Beyond the GPU: A Holistic Approach to AI Acceleration

While the new Rubin GPU boasts remarkable specifications – 50 quadrillion floating-point operations per second (petaFLOPS) of 4-bit computation versus 10 petaflops on Blackwell for transformer-based inference – Nvidia emphasizes that the performance gains aren’t solely attributable to the GPU itself. The architecture comprises a total of six interconnected chips: the Vera CPU, the Rubin GPU, and four dedicated networking chips.

“The same unit connected in a different way will deliver a wholly different level of performance,” explained a senior Nvidia official. “That’s why we call it extreme co-design.” This holistic approach highlights a move away from simply maximizing individual component performance towards optimizing the interplay between all elements of the system.

the Rise of “In-Network Compute” and Distributed AI

AI workloads are increasingly running on massive clusters of GPUs, a trend that necessitates a new approach to data management and processing. “Two years back, inferencing was mainly run on a single GPU, a single box, a single server,” a company representative stated. “Right now, inferencing is becoming distributed, and it’s not just in a rack. It’s going to be across racks, and across data centers.”

Nvidia’s solution is “in-network compute,” where computations are performed while data is in transit. “Save time. This is what we do.” By performing computations while data is in transit, Nvidia aims to minimize latency and maximize throughput.

This concept of in-network computing isn’t entirely new, having been in development since around 2016, but the Rubin architecture significantly expands the range of computations that can be offloaded to the network.

Scaling Out and Addressing the Jitter Problem

beyond the rack, the Rubin architecture incorporates a “scale-out network” to connect multiple racks within a data center. this network relies on three key components: the ConnectX-9 networking interface card, the bluefield-4 data processing unit (paired with two Vera CPUs and a ConnectX-9), and the Spectrum-6 Ethernet switch, which utilizes co-packaged optics for faster data transmission.

A critical challenge in scaling out to multiple racks is jitter – the variation in arrival times of data packets. “Scale-out infrastructure needs to make sure that those GPUs can communicate well in order to run a distributed computing workload and that means I need a network that has no jitter in it,” a senior official stated. Jitter can lead to important performance bottlenecks, as racks must wait for the slowest to complete its calculations. “Jitter means losing money,” they added.

The Next Frontier: Connecting Multiple Data Centers

While nvidia’s current chips don’t specifically address connections between data centers – termed “scale-across” – the company views this as the logical next step. “It doesn’t stop here, because we are seeing the demands to increase the number of GPUs in a data center,” a company representative explained. “100,000 GPUs is not enough anymore for some workloads, and now we need to connect multiple data centers together.”

The Vera Rubin architecture represents a fundamental shift in how Nvidia approaches AI computing, moving beyond simply improving individual components to optimizing the entire system for maximum performance and efficiency. This co-design ideology promises to unlock new possibilities for AI innovation and accelerate the development of increasingly complex and demanding applications.

You may also like

Leave a Comment