business resources
Enhanced Data Center Performance and Energy Efficiency: NVIDIA Innovations At Hot Chips 2024
6 Sept 2024, 7:57 am GMT+1
Hot Chips 2024 is a hybrid conference with in-person attendance at Memorial Auditorium, Stanford University, that took place on August 25 to 27, 2024. In addition to the NVIDIA Blackwell platform, new research on liquid cooling, and AI agents to support chip design, the leader of the GPU and chip technology showcases NVIDIA Quasar Quantisation System, NVLink, amongst other innovations.
NVIDIA unveiled some of its most advanced technological innovations at Hot Chips 2024, one of the premier global conferences for processor and system architecture. This event has become a critical forum for showcasing advancements in the trillion-dollar data center computing market, drawing top engineers and researchers from around the world.
NVIDIA's senior engineers took the center stage to present the NVIDIA Blackwell platform, a groundbreaking technology designed to drive the next generation of AI across industries and geographies. Among the key innovations to be showcased:
- NVIDIA Blackwell Architecture: This cutting-edge platform integrates multiple chips, systems, and the NVIDIA CUDA software, propelling AI capabilities for a wide array of use cases.
- NVIDIA GB200 NVL72: This liquid-cooled, rack-scale solution connects 72 Blackwell GPUs and 36 Grace CPUs, offering unprecedented levels of performance for AI system design.
- NVLink Interconnect Technology: Allowing for seamless, all-to-all GPU communication, this innovation delivers record-breaking throughput and low-latency inference, especially critical for generative AI applications.
- NVIDIA Quasar Quantisation System: A significant leap forward in AI computing, this system pushes the limits of physics to accelerate AI performance in data centers.
NVIDIA presented some of the pioneering innovations across data center computing and AI, delivering solutions that set new standards in performance, efficiency, and system optimisation.
NVIDIA researchers highlighted the role of AI in processor design, showing how AI-powered models are now being employed to help design AI processors themselves — a meta-loop of AI development.
The NVIDIA Blackwell, presented on August 26, included new architectural insights and live demonstrations of generative AI models running on Blackwell silicon. The event was preceded by tutorials on hybrid liquid-cooling solutions and the role of large language model (LLM)-powered AI agents in advancing chip design, on August 25.
NVIDIA Blackwell presentation at Hot Chips 2024
NVIDIA Blackwell represents the ultimate full-stack solution for next-generation computing challenges, integrating multiple advanced NVIDIA technologies, including the Blackwell GPU, Grace CPU, BlueField data processing unit, ConnectX network interface card, NVLink Switch, Spectrum Ethernet switch, and Quantum InfiniBand switch.
These components work together to redefine AI performance with cutting-edge accelerated computing while improved energy efficiency.
Ajay Tirumala and Raymond Wong, NVIDIA’s Directors of Architecture, provided the first in-depth look at the Blackwell platform, illustrating how this ecosystem of technologies operates cohesively to set a new standard in AI and data center performance.
A standout feature of the platform is the NVIDIA GB200 NVL72 multi-node solution, designed specifically for large language model (LLM) inference. This system enables low-latency, high-throughput token generation, accelerating inference by up to 30x for LLM workloads. This leap in speed allows for real-time operation of trillion-parameter models.
Tirumala and Wong also presented the NVIDIA Quasar Quantisation System, which integrates algorithmic advancements, NVIDIA software libraries, and tools along with Blackwell’s second-generation Transformer Engine. This system enables high accuracy even on low-precision models, showcasing significant advancements in LLMs and visual generative AI applications.
NVIDIA's Hybrid Cooling innovations for AI Data Centers at Hot Chips 2024
NVIDIA is transforming how AI data centers manage heat, moving beyond traditional air-cooled systems toward more efficient and sustainable hybrid cooling methods, which combine both air and liquid cooling techniques.
Liquid cooling offers a significant advantage over air cooling by effectively transferring heat away from high-performance computing systems, even during demanding workloads. This technology not only enhances cooling efficiency but also allows for more compact setups that use less space and consume less power. As a result, data centers can increase their server capacity, boosting compute power without increasing their energy footprint.
Ali Heydari, Director of Data Center cooling and infrastructure at NVIDIA, presented several hybrid-cooled designs. These included retrofitting existing air-cooled data centers with liquid-cooling units for a faster, more affordable transition.
Other, more advanced designs, involve direct-to-chip liquid cooling through specialised piping or complete server submersion in immersion cooling tanks. While the latter options require higher upfront investment, they offer long-term savings in energy and operational costs.
Heydari also showcased NVIDIA's role in the COOLERCHIPS project, part of the U.S. Department of Energy’s initiative to develop advanced cooling solutions. Using NVIDIA Omniverse, the team is creating digital twins to simulate and optimise energy consumption and cooling efficiency, pushing the boundaries of data center design innovation.
AI Agents Chips: NVIDIA highlights processor design innovation at Hot Chips 2024
NVIDIA is leveraging AI models to enhance design quality, productivity, and efficiency. These AI models assist engineers by automating time-consuming tasks, predicting design outcomes, and optimising processes. This includes the use of large language models (LLMs), which can generate code, debug designs, and provide real-time assistance during the design process.
Mark Ren, NVIDIA's Director of Design Automation Research, presented an overview of how these models are being applied to semiconductor design. He also delved deeper into the use of agent-based AI systems in microprocessor development.
NVIDIA is advancing AI agents, powered by LLMs, that can autonomously complete design tasks. These agents interact with designers, leveraging vast datasets of human and AI experiences to make informed decisions and improve designs. For example, AI agents are being used for timing report analysis, cell cluster optimisation, and code generation — tasks that previously required manual intervention.
Ren also showcased real-world applications, including work on cell cluster optimisation, which recently won Best Paper at the inaugural IEEE International Workshop on LLM-Aided Design.
Share this
Pallavi Singal
Editor
Pallavi Singal is the Vice President of Content at ztudium, where she leads innovative content strategies and oversees the development of high-impact editorial initiatives. With a strong background in digital media and a passion for storytelling, Pallavi plays a pivotal role in scaling the content operations for ztudium's platforms, including Businessabc, Citiesabc, and IntelligentHQ, Wisdomia.ai, MStores, and many others. Her expertise spans content creation, SEO, and digital marketing, driving engagement and growth across multiple channels. Pallavi's work is characterised by a keen insight into emerging trends in business, technologies like AI, blockchain, metaverse and others, and society, making her a trusted voice in the industry.
previous
Your Guide to Getting the Most Out of Online Helpdesk Services
next
The Future of Meal Kits in the Digital Marketplace