How Computer Chips Are Being Upgraded to Serve AI Workloads in Data Centers17 min read

by Drew Robb | Jul 3, 2024 | Blog

Server room or server computers with data hud.3d rendering.

A lot is going on concerning chip development. Nvidia, AMD, Intel and others are working on designs to facilitate greater AI capabilities. It is all about packing more power into smaller spaces to provide abundant processing power.

There is a metric known as thermal design power (TDP) which measures the number of watts per processor. According to projections from Omdia, the TDP explosion has begun. From 200-300 TDP for CPUs and GPUs a year or two ago, it is heading to 1000 watts and beyond within a few years.

“TDP has been spiking since 2020 so we need to rethink the cooling roadmap by incorporating liquid,” said Mohammad Tradat, Ph.D, Manager of Data Center Mechanical Engineering at Nvidia.

His company has developed a new GPU that can consume 138 kW in a single rack. The Nvidia H100 Tensor Core GPU is part of the company’s DGX SuperPOD platform that is designed to maximize AI throughput.

Chip power needs are growing almost exponentially. Courtesy of Omdia.

Intel, too, is innovating to meet the needs of AI. The company’s new Intel Gaudi 3 chip is a potential rival to Nvidia’s H100 processor. Dev Kulkarni Ph.D, Senior Principal Engineer and Thermal Architect at Intel, claims it is 40% more power efficient and 1.7X faster in the training large language models (LLMs) that sit at the heart of generative AI applications like ChatGPT.

“We can pack 1000 watts or more in a small area which makes cooling challenging,” said Kulkarni.

Vlad Galabov, Head of Omdia’s data center practice, said the die size of CPUs has risen by 100 times since the 1970s and processors are now 7.6x larger and require 4.6X more power compared to the state-of-the-art in 2000. He predicts the trend to accelerate: That the number of cores in a processor will arrive at 288 by the end of 2024, 10 times more than in 2017. Software optimization, too, will lead to customized processors tailored to AI applications that enable another wave of server consolidation.

Despite its already immense impact on data center power consumption, generative AI is only at the early adopter stage. As well as research into bigger, better, faster and more core-dense chips, there is also work ongoing to streamline AI efficiency to make it easier to function with CPUs rather than GPUs. A company known as ThirdAI pre-trained its Bolt large language model (LLM) using only 10 servers each with two Sapphire Rapids CPUs, far more efficient than GPT-2 that needed 128 GPUs. The company claims Bolt is 160 times more efficient than traditional LLMs.

Cooling and Efficiency Push

If data centers are to stay efficient and keep highly dense AI systems cool, there is a need for:

Innovation in liquid cooling and data center efficiency
Emphasis on the basics of data center cooling and airflow management, such as containment for hot/cold aisle separation

To facilitate innovation, the Advanced Research Projects Agency-Energy (ARPA-E) has launched the COOLERCHIPS program. It is funding research to develop transformational, highly efficient, and reliable cooling technologies that reduce total cooling energy expenditure to less than 5% of a typical data center’s IT load at any time and any U.S. location for a high-density compute system, according to Peter de Bock, Program Director for ARPA-E.

To date, progress has been made in a variety of approaches that can cool data center densities of 80kW/m3 or more. This includes 3D flow manipulation of cold plates to efficiently transfer heat, better materials for cold plates, advanced immersion cooling with more efficient fluids, and finding the optimum temperature for data centers.

“Reimagined data center architectures may enable us to not have humans in the same room as computers so we can run hotter data centers and lower energy demands,” said De Bock.

Liquid cooling will certainly play a big part. Expect plenty of advances to take place in this vibrant sector in the coming months and years. But liquid will only take us so far. It must be supported by heavy emphasis on efficiency that spans every aspect of the data center if Power Usage Effectiveness (PUE) is going to stay low. In particular, the basics of air cooling must be meticulously applied. That means bringing air cooling precisely to hot spots, paying attention to airflow to ensure there is not unwanted mixing of hot and cold air. Exhaust air from the hot aisles can’t be allowed to escape back into cold aisles. That means liberal use of blanking panels, containment systems, grommets for wiring that block the passage of air and more. All the basics need to be applied.

Simple efficiency solutions become more important than ever. After all, AI has raised the stakes. Data center processing hardware and liquid cooling systems are going to require such heavy investment that costs must be contained across the boards. The basics of air cooling and the correct channeling and containment of hot and cold air must be implemented side by side with liquid cooling. Otherwise, efficiency levels will remain low and hot spots will multiply.

“Data Center providers need to facilitate density ranges beyond the normal 10–20kW/rack to 70kW/rack and 200—300kW/rack,” said Courtney Munroe, an analyst at International Data Corp. “This will necessitate innovative cooling and heat dissipation.”

Real-time monitoring, data-driven optimization.

Immersive software, innovative sensors and expert thermal services to monitor,
manage, and maximize the power and cooling infrastructure for critical
data center environments.

Learn More

Real-time monitoring, data-driven optimization.

Immersive software, innovative sensors and expert thermal services to monitor, manage, and maximize the power and cooling infrastructure for critical data center environments.

Learn More

Drew Robb

Writing and Editing Consultant and Contractor

Drew Robb has been a full-time professional writer and editor for more than twenty years. He currently works freelance for a number of IT publications, including eSecurity Planet and CIO Insight. He is also the editor-in-chief of an international engineering magazine.

0 Comments

Submit a Comment Cancel reply

Subscribe to the Upsite Blog

Follow Upsite

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.

Download Free Whitepaper

Optimizing Data Center Cooling

Our airflow management solutions are specifically engineered to increase the efficiency and effectiveness of data center cooling.

Our Products Request Quote

Why Upsite Technologies?

We are committed to providing excellent data center cooling products and tools. As part of our dedication to the science and art of data center cooling optimization, we offer timely and insightful research which addresses the demands and trends related to cooling optimization and cooling management. We also recommend industry-recognized best practices and solutions for maintaining data center cooling best practices.

Available cooling lost to bypass airflow in typical data center.

We provide best-in-class products, industry-leading education, and hands-on advice to provide data center operators with the tools needed to maximize their cooling investments.

KoldLok is the industry standard for sealing raised-floor data centers. Designed to limit bypass airflow by efficiently sealing openings in raised floors.

Learn more.

Our line of rack airflow management products, specifically designed to control intake airflow in server racks. Our award-winning blanking panels are considered the 'best-fitting' in the industry.

Learn more.

Our modular containment solution and line of aisle airflow management products are specifically designed to block airflow, ensuring hot and cold aisle separation.

Learn more.

A Trusted Resource Since 2001

Ensuring there’s no wasted cooling is vital. Accurate control of the airflow through KoldLok Grommets allows us to consider energy-savings strategies such as increasing CRAC set points.

Chris Flanagan

Data Center Development Manager, Fujitsu Services

After having hands-on experience with nearly every available product on the market, HotLok is by far the superior choice. Not only does it create the best seal, but its universal mounting system and finger grips make it the easiest to use and reuse.

Lauren Tucker

Director for National Sales, DataSite, National Wholesale and Retail Colocation Owner/Operator

We have looked and looked for a solution like KoldLok. If KoldLok had been around ten years ago, we probably would not have needed to add a cooling unit back then. It improves the efficiency of our cooling system a great deal.

Doug Becker

IT Operations Supervisor, Cargill IT Services, Global Hosting Service

The 4 R's of Airflow Management™

Managing cooling in your data center is a holistic process. We've developed a methodology which we call the 4R's of Airflow Management which addresses the specific needs of aisle containment, floor containment, hot aisle containment and then monitoring of the facility. Each part of the 4R's addresses particular needs which build upon each other and enhance the effectiveness of cooling optimization efforts.

Average percentage of data centers that waste their valuable conditioned airflow through unsealed openings in the raised floor.

Recent Upsite Blog Posts

The Importance of Mutual Understanding Between IT and Facilities – Part 6: Specifying Cages that Are Compatible with Containment

by Ian Seaton | Mar 26, 2025 | Blog | 0 Comments

Cages and containment don’t have to be mutually exclusive. If the cage is not compatible with containment, there will be a need for extra volume airflow and lower cooling unit set points, resulting in higher operating costs. Customer/client cages in colocation data...

Disaster Preparedness for Data Centers

by Drew Robb | Mar 19, 2025 | Blog | 0 Comments

Fires in LA, floods in the Carolinas, storm surge in Florida – natural disasters seem to be more frequent than ever. And with data center construction entering a period of expansion – both in traditional areas as well as across the nation – data centers managers are...

The Importance of Mutual Understanding Between IT and Facilities – Part 5: Specifying Cooling Unit Set Points

by Ian Seaton | Mar 12, 2025 | Blog | 0 Comments

Best practice is to specify maximum allowable IT equipment inlet temperature and let mechanical plant find its own level. Managing temperature by thermostat set point frequently results in mechanical plant wasted energy and cycling or heating by cooling equipment....

See All Blog Posts

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.

Download Free Whitepaper

How Computer Chips Are Being Upgraded to Serve AI Workloads in Data Centers17 min read

Cooling and Efficiency Push

Real-time monitoring, data-driven optimization.

Real-time monitoring, data-driven optimization.

Drew Robb

0 Comments

Submit a Comment Cancel reply

Subscribe to the Upsite Blog

You have Successfully Subscribed!

Follow Upsite

Trending

Archives

Cooling Capacity Factor (CCF) Reveals Data Center Savings

You have Successfully Subscribed!

Request Info On An Assessment

Request Information About A Cooling Science Seminar

Get the Case Study

Get the Case Study

Download Case Study

Optimizing Data Center Cooling

Why Upsite Technologies?

Available cooling lost to bypass airflow in typical data center.

We provide best-in-class products, industry-leading education, and hands-on advice to provide data center operators with the tools needed to maximize their cooling investments.

A Trusted Resource Since 2001

The 4 R's of Airflow Management™

Average percentage of data centers that waste their valuable conditioned airflow through unsealed openings in the raised floor.

Recent Upsite Blog Posts

The Importance of Mutual Understanding Between IT and Facilities – Part 6: Specifying Cages that Are Compatible with Containment

Disaster Preparedness for Data Centers

The Importance of Mutual Understanding Between IT and Facilities – Part 5: Specifying Cooling Unit Set Points

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Request A Quote

Pin It on Pinterest