Nvidia reveals details of Grace Hopper’s engineering in Hot Chips

by time news

Did you miss Transform 2022? Check out all Summit sessions now in our on-demand library. look here.


Nvidia engineers will focus on Grace CPUs, Hopper graphics processing units (GPUs), Orin system-on-chips (SoCs), and NVLink network switches at the virtual Hot Chips conference next week 4 I will give a technical presentation.

All of this represents the company’s plans to build an advanced data center infrastructure with a full suite of chips, hardware and software.

During the presentation, Dave Salvator, Director of Product Marketing for AI Inference, Benchmarking, and Cloud at Nvidia, shared new details about Nvidia’s platform for artificial intelligence (AI), edge computing, and high performance computing. He said this in an interview with VentureBeat.

If there are clear trends throughout the conversation, they all illustrate how accelerated computing has been embraced over the past few years in designing modern data centers and systems at the edge of the network, says Salvatore. CPUs are no longer expected to do all the heavy work themselves.

Event

META VAZ 2022

MetaBeat will gather thought leaders on October 4th in San Francisco, California to provide guidance on how MetaBerse technologies are changing how all industries communicate and do business.

Register here

Hot Chips Event

About Hot Chips, Salvatore said: Over the past few years, the show has tended to focus a little bit on the CPU with occasional accelerators. But I think the interesting trend line is that we’re seeing more accelerators, especially if we look at the advanced software already published on the AI ​​Chips website. It’s certainly from us, but it’s also from others. And I think we just realize that these accelerators are absolute game-changers for the data center. “

He added: It’s a mixture of things, isn’t it? It’s not just that GPUs are better at something. It was truly a massive, decade-long collaborative effort to get us where we are today. “

Nvidia, who was speaking at a Virtual Hot Chip event (usually held on the Silicon Valley College campus), describes the annual gathering of processor and system engineers. They reveal performance numbers and other technical details for Nvidia’s first server CPUs, Hopper GPUs, the latest versions of NVSwitch interconnect chips, and the Nvidia Jetson Orin System on Module (SoM).

Presentations provide new insight into how the Nvidia platform is achieving new levels of performance, efficiency, scale, and security.

Specifically, the talks articulate a design philosophy for innovation across the entire spectrum of chips, systems and software, with GPUs, CPUs, and CPUs acting as peer processors, Salvatore said. Together, they create platforms that already power artificial intelligence, data analytics, and high-performance computing functions in cloud service providers, supercomputing centers, enterprise data centers, and standalone systems.

Inside the Nvidia Server CPU

Nvidia’s NVLink Network Switch.

Data centers require flexible combinations of CPUs, GPUs, and other accelerators that share large pools of memory to deliver the energy-efficient performance that today’s workloads demand.

The Nvidia Grace CPU is the first data central processing unit developed by Nvidia, built from the ground up to create the world’s first super chip.

Jonathon Evans, acclaimed engineer and 15-year veteran Nvidia expert, explains Nvidia NVLink-C2C. It connects CPUs and GPUs at 900 gigabits per second and delivers data transfers that consume only 1.3 picojoules per bit, making it five times more power efficient than the current PCIe Gen 5 standard.

The NVLink-C2C connects two CPU chips to create an Nvidia Grace CPU with 144 Neoverse cores. This is a processor designed to solve the world’s largest computing problems. Nvidia didn’t want to create custom instructions that could make programming more complex, so they use standard Arm cores.

For maximum efficiency, Grace CPUs use LPDDR5X memory. This allows for a memory bandwidth of 1 TB per second while keeping the complex total power consumption below 500 watts.

Nvidia Grace is designed to deliver performance and energy efficiency to meet the demands of modern data center workloads powering digital twins, cloud gaming, graphics, artificial intelligence, and high performance computing (HPC). Grace CPUs feature 72 Arm v9.0 CPU cores that implement the Arm Scalable Vector Extensions Version 2 (SVE2) instruction set. The center also includes Virtualization Extensions with nested virtualization capabilities and S-EL2 support.

Nvidia Grace processors are also compatible with the following Arm specification: RAS v1.1 Generic Interrupt Controller (GIC) v4.1. memory partitioning and monitoring (MPAM); and System Memory Management Module (SMMU) v3.1.

The Grace CPU can be paired with an Nvidia Hopper GPU to create the ultimate Nvidia Grace CPU chip for large-scale AI training, heuristics and HPC, or paired with another Grace CPU for a high-performance CPU that meets your HPC needs. Builds. Cloud computing workloads.

The NVLink-C2C also links the Grace CPU and Hopper GPU chips as memory-sharing peers to the Nvidia Grace Hopper Superchip, which combines two separate chips into one unit. This allows for maximum acceleration for performance-intensive jobs such as AI training.

Anyone can use NVLink-C2C to create custom chips (or sub-components of the chip) and seamlessly connect to Nvida GPUs, CPUs, Data Processing Units (DPUs), and SoCs to create this new class of integrated products. The intercom supports AMBA CHI and CXL protocols used by Arm and x86 processors respectively.

For system-wide expansion, the new Nvidia NVSwitch connects multiple servers into one AI supercomputer. Operating at 900 gigabits per second, NVLink uses more than seven times the bandwidth of PCIe Gen 5.

NVSwitch allows users to connect 32 Nvidia DGX H100 systems (supercomputers in a box) to an AI supercomputer for peak AI performance.

“This will allow multiple server nodes to communicate with each other via NVLink using up to 256 GPUs,” Salvatore said.

Alexander Ishi and Ryan Wells, veteran engineers at Nvidia, will enable users to build systems with up to 256 graphics processing units (GPUs) for hard work such as training AI models with more than 1 trillion parameters. Describes how the download is handled. This switch contains an engine that uses the Nvidia Scalable Hierarchical Aggregation Protocol to speed up data transfers. SHARP is an in-network computing capability that debuted on the Nvidia Quantum InfiniBand Network. Communication-intensive AI applications can double the speed of data transmission.

“The goal here is to significantly improve performance across the sockets,” Salvatore says. In other words, removing the bottleneck.

Senior Distinguished Engineer Jack Shockett, who has been with the company for 14 years, gave an in-depth tour of the Nvidia H100 Tensor Core GPU, also known as the Hopper. Not only does it take you to new heights with new interfaces, it’s packed with features that improve performance, efficiency, and security in your accelerator.

Hopper’s new Transformer Engine and upgraded Tensor Cores provide 30X faster than previous generation AI inference with the world’s largest neural network models. It also adopts the world’s first HBM3 memory system, providing a huge memory bandwidth of 3TB. This is NVIDIA’s largest generation increase to date.

Among other new features, Hopper adds virtual support for multi-tenant and multi-user configurations. The new DPX instructions accelerate iterative loops in selection maps, DNA, and protein analysis applications. Hopper also includes support for improving security with secret computing.

Choquette, one of the lead designers of the chips for the Nintendo 64 console, explains the parallel computing technology behind Hopper’s advancement.

Michael Detty, the company’s director of architecture for 17 years, introduces new performance specifications for the Nvidia Jetson AGX Orin, an advanced artificial intelligence engine, advanced robotics and autonomous machines.

It integrates 12 Arm Cortex-A78 cores and Nvidia Ampere GPUs to perform up to 275 trillion operations per second for AI inference functions. It is 2.3 times more energy efficient than the previous generation and delivers up to 8 times performance.

The latest production unit is part of a compatible family that can shrink down to a pocket-sized 5W Jetson Nano developer kit with up to 32GB of memory.

software stack

Nvidia Grace . CPU

All new chips accelerate over 700 applications and support the Nvidia software package used by 2.5 million developers. Based on the CUDA programming model, dozens of Nvidia Software Development Kits (SDKs) are included for vertical markets such as Automotive (Drive), healthcare (Clara), as well as recommendation systems (Merlin) and conversational artificial intelligence (Riva) technology.

The NVIDIA Grace CPU Superchip is designed to provide a standard platform for software developers. Arm provides a set of specifications as part of the System Ready initiative, which aims to achieve standardization for the Arm system.

Grace CPUs target Arm system standards to provide compatibility with off-the-shelf operating systems and software applications. Grace CPUs have been using the Nvidia Arm software stack from the start.

The Nvidia AI Platform is available from all major cloud and system manufacturers. Nvidia works with leading HPC, supercomputing, supercomputing, and cloud customers to provide you with the ultimate Grace CPU chip. The Grace CPU Superchip and Grace Hopper Superchip are expected to be available in the first half of 2023.

“Due to the architecture of the data center, these fabrics are designed to reduce bottlenecks and ensure that GPUs and CPUs work together as peer processors,” Salvatore said.

VentureBeat mission It will become a digital city arena for technical decision makers to gain knowledge about innovative enterprise technology and commerce. Find out more about membership here.

You may also like

Leave a Comment