Now Reading
Asserting NVIDIA DGX GH200: The First 100 Terabyte GPU Reminiscence System

Asserting NVIDIA DGX GH200: The First 100 Terabyte GPU Reminiscence System

2023-05-30 19:58:13

At COMPUTEX 2023, NVIDIA introduced NVIDIA DGX GH200, which marks one other breakthrough in GPU-accelerated computing to energy probably the most demanding big AI workloads. Along with describing essential features of the NVIDIA DGX GH200 structure, this submit discusses how NVIDIA Base Command permits speedy deployment, accelerates the onboarding of customers, and simplifies system administration.

The unified reminiscence programming mannequin of GPUs has been the cornerstone of assorted breakthroughs in advanced accelerated computing functions over the past 7 years. In 2016, NVIDIA launched NVLink expertise and the Unified Reminiscence Programming mannequin with CUDA-6, designed to extend the reminiscence obtainable to GPU-accelerated workloads. 

Since then, the core of each DGX system is a GPU advanced on a baseboard interconnected with NVLink through which every GPU can entry the opposite’s reminiscence at NVLink pace. Many such DGX with GPU complexes are interconnected with high-speed networking to kind bigger supercomputers such because the NVIDIA Selene supercomputer. But an rising class of big, trillion-parameter AI fashions would require both a number of months to coach or can’t be solved even on immediately’s greatest supercomputers. 

To empower the scientists in want of a complicated platform that may resolve these extraordinary challenges, NVIDIA paired NVIDIA Grace Hopper Superchip with the NVLink Swap System, uniting as much as 256 GPUs in an NVIDIA DGX GH200 system. Within the DGX GH200 system, 144 terabytes of reminiscence will likely be accessible to the GPU shared reminiscence programming mannequin at excessive pace over NVLink. 

In comparison with a single NVIDIA DGX A100 320 GB system, NVIDIA DGX GH200 offers practically 500x extra reminiscence to the GPU shared reminiscence programming mannequin over NVLink, forming a large information center-sized GPU. NVIDIA DGX GH200 is the primary supercomputer to interrupt the 100-terabyte barrier for reminiscence accessible to GPUs over NVLink.

Linear graph illustrating the gains made in GPU memory as a result of NVLink technology progression.
Determine 1. GPU reminiscence positive aspects on account of NVLink development 

NVIDIA DGX GH200 system structure

NVIDIA Grace Hopper Superchip and NVLink Swap System are the constructing blocks of NVIDIA DGX GH200 structure. NVIDIA Grace Hopper Superchip combines the Grace and Hopper architectures utilizing NVIDIA NVLink-C2C to ship a CPU + GPU coherent reminiscence mannequin. The NVLink Swap System, powered by the fourth technology of NVLink expertise, extends NVLink connection throughout superchips to create a seamless, high-bandwidth, multi-GPU system.

Every NVIDIA Grace Hopper Superchip in NVIDIA DGX GH200 has 480 GB LPDDR5 CPU reminiscence, at eighth of the ability per GB, in contrast with DDR5 and 96 GB of quick HBM3. NVIDIA Grace CPU and Hopper GPU are interconnected with NVLink-C2C, offering 7x extra bandwidth than PCIe Gen5 at one-fifth the ability. 

NVLink Swap System kinds a two-level, non-blocking, fat-tree NVLink material to totally join 256 Grace Hopper Superchips in a DGX GH200 system. Each GPU in DGX GH200 can entry the reminiscence of different GPUs and prolonged GPU reminiscence of all NVIDIA Grace CPUs at 900 GBps. 

Compute baseboards internet hosting Grace Hopper Superchips are related to the NVLink Swap System utilizing a customized cable harness for the primary layer of NVLink material. LinkX cables lengthen the connectivity within the second layer of NVLink material. 

Diagram illustrating the topology of a fully connected NVIDIA NVLink Switch System across NVIDIA DGX GH200 consisting of 256 GPUs: 36 NVLink switches.
Determine 2. Topology of a completely related NVIDIA NVLink Swap System throughout NVIDIA DGX GH200 consisting of 256 GPUs

Within the DGX GH200 system, GPU threads can tackle peer HBM3 and LPDDR5X reminiscence from different Grace Hopper Superchips within the NVLink community utilizing an NVLink web page desk. NVIDIA Magnum IO acceleration libraries optimize GPU communications for effectivity, enhancing software scaling with all 256 GPUs. 

Each Grace Hopper Superchip in DGX GH200 is paired with one NVIDIA ConnectX-7 community adapter and one NVIDIA BlueField-3 NIC. The DGX GH200 has 128 TBps bi-section bandwidth and 230.4 TFLOPS of NVIDIA SHARP in-network computing to speed up collective operations generally utilized in AI and doubles the efficient bandwidth of the NVLink Community System by decreasing the communication overheads of collective operations.

For scaling past 256 GPUs, ConnectX-7 adapters can interconnect a number of DGX GH200 programs to scale into an excellent bigger resolution. The facility of BlueField-3 DPUs transforms any enterprise computing surroundings right into a safe and accelerated digital personal cloud, enabling organizations to run software workloads in safe, multi-tenant environments.

Goal use circumstances and efficiency advantages

The generational leap in GPU reminiscence considerably improves the efficiency of AI and HPC functions bottlenecked by GPU reminiscence dimension. Many mainstream AI and HPC workloads can reside fully within the combination GPU reminiscence of a single NVIDIA DGX H100. For such workloads, the DGX H100 is probably the most performance-efficient coaching resolution.

Different workloads—akin to a deep studying suggestion mannequin (DLRM) with terabytes of embedded tables, a terabyte-scale graph neural community coaching mannequin, or massive information analytics workloads—see speedups of 4x to 7x with DGX GH200. This exhibits that DGX GH200 is a greater resolution for the extra superior AI and HPC fashions requiring large reminiscence for GPU shared reminiscence programming.

See Also

The mechanics of speedup are described intimately within the NVIDIA Grace Hopper Superchip Architecture whitepaper.

Bar graph compares the performance gains between an NVIDIA DGX H100 Cluster with NVIDIA InfiniBand and an NVIDIA DGX GH200 with NVLink Switch System when applied to large AI models that impose giant memory demands for particular workloads, including emerging NLP, larger recommender systems, graph neural networks, graph analytics and data analytics workloads.
Determine 3. Efficiency comparisons for large reminiscence AI workloads

Objective-designed for probably the most demanding workloads

Each part all through DGX GH200 is chosen to attenuate bottlenecks whereas maximizing community efficiency for key workloads and totally using all scale-up {hardware} capabilities. The result’s linear scalability and excessive utilization of the huge, shared reminiscence area. 

To get probably the most out of this superior system, NVIDIA additionally architected an especially high-speed storage material to run at peak capability and to deal with quite a lot of information sorts (textual content, tabular information, audio, and video)—in parallel and with unwavering efficiency. 

Full-stack NVIDIA resolution

DGX GH200 comes with NVIDIA Base Command, which incorporates an OS optimized for AI workloads, cluster supervisor, libraries that speed up compute, storage, and community infrastructure are optimized for DGX GH200 system structure. 

DGX GH200 additionally contains NVIDIA AI Enterprise, offering a collection of software program and frameworks optimized to streamline AI improvement and deployment. This full-stack resolution permits prospects to concentrate on innovation and fear much less about managing their IT infrastructure.

Diagram illustrates the full stack of software and software platforms that are included with the NVIDIA DGX GH200 AI supercomputer. The stack includes NVIDIA AI Enterprise software suite for developers, NVIDIA Base Command OS platform that includes AI workflow management, enterprise-grade cluster management, libraries that accelerate compute, storage, and network infrastructure, and system software optimized for running AI workloads.
Determine 4. The NVIDIA DGX GH200 AI supercomputer full stack contains NVIDIA Base Command and NVIDIA AI Enterprise

Supercharge big AI and HPC workloads

NVIDIA is working to make DGX GH200 obtainable on the finish of this yr. NVIDIA is raring to offer this unbelievable first-of-its-kind supercomputer and empower you to innovate and pursue your passions in fixing immediately’s largest AI and HPC challenges. Learn more.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top