Custom AI Data Centers are on the Horizon17 min read
AI is challenging traditional data center designs, which lack the necessary power, cooling, and capacity to handle the density and intensity of AI workloads. As a result, new, custom-designed data centers are being built specifically to meet the demands of AI.
Meta (Facebook) is considering a design that includes traditional servers and data center equipment in one half of the facility with AI servers in a separate area. Instead of packing the facility with AI servers which might be difficult to cool, Meta wants to put a row of cooling distribution units (CDUs) on each side of a row of AI racks. By building them in this way, costs may be lower and power can be more easily acquired. This system also provides the level of redundancy needed by AI. If you are running AI servers worth hundreds of thousands of dollars, the last thing you want is a cooling failure. Meta’s design includes double the needed CDU capacity with each CDU working at 50% capacity unless one fails. Facebook’s new $800 million facility being built in Jeffersonville, Indiana will probably be one of the first examples of this approach.
Liquid Cooling is Needed
Liquid cooling is definitely required in the AI data center. There are various ways to do this. Immersion cooling is probably the most effective, but it is difficult to do, is time consuming, and costly. It generally requires a complete rework of large areas of the data center, if not the entire facility. It entails bringing in a lot of water lines and cooling loops for the various specialized liquids used in immersion cooling. Servers are immersed in liquid as an efficient but expensive way to keep servers cool.
A simpler approach is to feed a liquid cooling loop to a few servers that need what is known as Direct-to-Chip (DtC) cooling. Basically, a small stream of liquid is fed directly into a cold plate that adjoins an AI chip.
Upgrading the Power Infrastructure
As well as upgraded cooling, the entire power infrastructure may need a complete rework. This may include converting to 48V bus architectures from the traditional 12V system used in many data centers. Higher voltage means lower current, better thermal management, and higher efficiency.
Power distribution units (PDUs), too, may need an upgrade. The higher-density racks used in AI need more robust PDUs. Older PDUs aren’t designed for modern densities and can be a source of failures due to the amount of heat generated. It’s a big mistake to invest millions in AI yet fail to invest a tiny fraction of that on PDU upgrades.
Choosing the Right Location
Location is everything when it comes to the AI data center. As well as customizing the rack, power, and cooling configurations, the location of the data center can be a critical element. Some regions may not have enough energy available to power an AI data center.
That’s one reason why Texas has become such a popular place for new data centers. It has abundant power, good internet connectivity, large metro areas hungry for data center services, and an attractive business climate.
Lancium’s Abilene, Texas campus, for example, was designed and is being built by Crusoe Energy. An initial 200 MW facility will eventually be expanded to as much as 1.2 GW. It should be online by the first quarter of 2025. Purpose-built high-density data halls will be powered by renewable energy and cooled by a combination of DtC liquid cooling, rear-door heat exchangers, and air cooling. The facility is squarely aimed at AI companies for applications like medical research, video production and aircraft/vehicle design using primarily renewable energy sources. At completion, each data center building will be able to operate up to 100,000 GPUs on a single integrated network fabric, according to Ali Fenn, President of Lancium.
“The main challenge in building the AI data center is access to power at scale on aggressive timelines, and this is forcing holistic changes to the traditional model, principally in opening up new locations,” said Fenn. “Speed and scale are the name of the game, and all of the hyperscalers are working hard to find and develop campuses at this scale on very aggressive timelines.”
A Slow Build
Don’t expect to see hundreds of AI data centers over the next couple of years. It will take time due to the fact that AI use cases are still under development and the need to overcome supply chain constraints. This includes a limited supply of NVIDIA GPUs. While the company is ramping up fast, the bulk of GPUs being produced are being guzzled by Amazon, Microsoft Azure, Google, and Meta. Others will either have to wait or pay more. Similarly, liquid cooling infrastructure and power availability will tend to curtail high volume AI data center construction for some time.
“In time, other competitors will be able to supply competitive AI hardware products and supply will be less constrained, but that will be at least a couple of years out,” said Alan Howard, data center analyst for Omdia. “For now, there will be some dedicated AI data centers, but do not expect a tsunami.”
Setting the standard for rack power reliability.
With automated soldering from line input to each receptacle,
PowerLok® eliminates all mechanical connections, making it 270%
less likely to fail than rack PDUs with mechanical terminations.
Setting the standard for rack power reliability.
With automated soldering from line input to each receptacle,
PowerLok® eliminates all mechanical connections, making it 270%
less likely to fail than rack PDUs with mechanical terminations.
Drew Robb
Writing and Editing Consultant and Contractor
0 Comments