Is Liquid Cooling the Way of the Future?17 min read

by | Oct 1, 2014 | Blog

For perspective, and to clarify what we are not discussing, it is useful to step back into data center history a bit and debunk some of the early claims made regarding liquid cooling. First, those early arguments rightly cited an older history of liquid-cooled mainframes before the evolution of file server computing platforms. However, the liquid cooling being promoted 5-10 years ago was not technically liquid cooling at all and has more accurately been more recently referred to as close-couple cooling.

As a matter of fact, those early liquid cooling systems still relied on conductive heat transfer of air moving across heat sinks inside the servers and removing that heat to cooling coils attached to the rear door of cabinets or located directly adjacent to cabinets or located directly above cabinets. The heat was then removed from these cooling coils in basically the same manner as it was removed from CRAC/CRAH coils, though with an exponentially larger network of plumbing hardware. Interestingly, the proponents of these systems touted the superior cooling capacity of water over air, with those claims of superiority ranging anywhere from 60X to 3100X. Besides addressing the liquid cooling that really wasn’t liquid cooling, some of this cooling performance rhetoric could also use a little debunking. One difference between water and air is heat capacity. Water has a heat capacity of 4184 joules per kilogram per degree Kelvin. Air has a heat capacity of 700 joules per kilogram per degree Kelvin. Clearly, water is superior in heat capacity, but only by a factor of 5.9X. Another way to slice this pie is to consider cooling capacity as a function of thermal conductivity. Water has a thermal conductivity of 0.6062 watts per meter per Kelvin, while air has a thermal conductivity of 0.0262. Clearly, water is a better thermal conductor, but only by a factor of 23X.

One might argue that applying “only” to 5.9X or 23X is trying too hard to make a point. After all, if your salary was 5.9X or 23X, you probably wouldn’t complain that you only got a 490% raise or only got a 2200% raise. Nevertheless, we are not talking about 3100X greater heat removal capacity, so how would 5.9X or 23X translate into practical cooling performance in the data center? If we assumed a small 1000 ft2 computer room with ten server cabinets and a 10’ high ceiling, we would be looking at a volumetric space of 10,000 cubic feet, or 4720 cubic feet after removing volumetric space for ten server cabinets and the cold aisles. By contrast, a thirty foot long 4” pipe (significant overkill) running from each close coupled cooling coil to some distribution point that would represent a similar distance to a chiller as the distance from CRAH units to chiller, would have a total volumetric area of 6.5 cubic feet. So even though water has a thermal capacity 5.9X more than air and a thermal conductivity 23X more than air, by volume, the volume of air available for heat removal in this example exceeds the volume of water by over 700X, so this mis-labeled liquid cooling was typically not as effective as air cooling. Considering that most of this close-coupled cooling plumbing has been 2” or 1” diameter rather than the 4” example, this performance delta in reality has likely been even greater.

However, efficiency is an entirely different metric than effectiveness and the proponents of these systems claimed efficiency advantages over air cooled systems. When comparing the efficiency of a closed-coupled (liquid) cooling system to a poorly executed legacy hot aisle – cold aisle (mostly, kinda sorta) data center, the close-couple data center design would always be more efficient. However, most of the bypass and re-circulation were eliminated by containment and/or other airflow management strategies so the efficiency comparisons came down to the fan systems of close-coupled systems versus the fan systems of perimeter cooling systems, it became quite a different story. Energy Efficiency Ratings (EER = BTUs removed  for row-based cooling fan systems typically ran in the low to mid 40’s, while perimeter cooling units had EERs ranging from mid 30’s to the upper 40’s. On a level playing field, the efficiency differences were more dependent on vendor and model differences than technology differences.

All that history aside, today, liquid cooling actually means liquid cooling, and that is where the promise of liquid cooling comes much closer to delivering. There are three basic configurations of liquid cooling: immersion, direct contact and partial direct contact.

Probably the most widely deployed today is what I’ll call partial direct contact liquid cooling. In these systems, heat sinks on microprocessors and sometimes other heat-intensive components are replaced with liquid cold plates. Liquid moving through these cold plates removes the heat to a nearby liquid-to-liquid heat exchanger from which heat is removed to some heat rejection system – historically a chiller. These are partial direct contact liquid cooling systems because there are still plenty of heat sources in the server chassis which are cooled by air, thereby requiring some variation of a “normal” air-cooling mechanical infrastructure. Because of the parallel infrastructures, there may not be a large efficiency gain in some of these deployments. However, if these are retrofitted into an existing space, they can provide two significant benefits: 1. Removing 25-30% of the load on the air cooling infrastructure can result in a significant fan energy savings, 2. Extra computing capacity can be added to a space without having to add the associated air cooling infrastructure; hence, a path to extending the life of an existing space. The manufacturers of these systems have taken pains to simplify the modification of servers to accommodate the liquid cooling cold plates and associated plumbing, but wide-spread adoption remains inhibited by conservative IT reluctance to “open the box” and risk warranty issues with their server vendors. Another benefit of these partial direct contact cooling solutions is that the more effective cooling provides a path for over-clocking processors and thereby getting custom-processor performance at standard processor prices.

An evolutionary step to partial direct contact liquid cooling is complete direct contact liquid cooling. In this technology, little highly conductive metal “houses” are placed over every heat source on a server mother board. All these metal shrouds are of the same height and press against a large cold plate essentially the size of the mother board. All the heat from all the components is transferred to the cold plate through the bridging conductors, thereby eliminating the need for any fan powered air heat removal. This approach eliminates the need for the parallel infrastructure required by partial direct contact liquid cooling, thereby reducing both opex and capex. However, the heat removal path is much more intrusive to the servers and very unlikely executable at the user level; this approach is going to be much more closely tied to custom servers designed specifically for the cooling technology.

The farthest departure from what we understand as data center thermal management is total immersion cooling. Quite simply, this technology involves trading a server cabinet for a high tech horse trough and immersing the servers in a non-electrically conductive but thermally conductive oil bath. There are reports of 200kW racks (i.e., tubs), so a very obvious benefit of immersion liquid cooling is the ability to support extremely high densities. A frequently cited disadvantage of immersion cooling is that you lose your vertical scalability, which could result in lower per square foot power densities. While this may apply to standard commercial servers, if you can deploy super high density equipment and get over 100 kW per tub, you are probably not making a density sacrifice. More practical concerns have to do with the need for special hard drives and some special supporting infrastructure for vats and liquid storage. In addition, there may be some cultural issues with educated white collar IT specialists working around, and in, tanks of oil. As with other liquid cooling approaches, immersion provides a significant microprocessor over-clocking performance benefit.

Direct contact liquid cooling and partial direct contact liquid cooling have advantages for boosting the cooling capacity of existing spaces, while immersion cooling appears to make the most TCO investment sense for new construction. All three technologies support overclocking processors and getting super-computer performance from standard processors. In general, early adopters have tended to be universities and scientific laboratories where transactional throughput is highly valued. We have also seen some early adopters in the bitcoin mining community, where transactional velocity is also highly valued. Both the direct contact liquid cooling and partial direct contact liquid cooling work effectively with warm, even hot, water — 90⁰F or higher liquid can still effectively cool components and allow for over-clocking.  For the time being, adoption will likely remain limited to those applications where we currently see these solutions deployed. However, that could all change with one of the major server OEM’s marketing computing hardware specifically designed for one or more of these liquid cooling approaches.

Ian Seaton

Ian Seaton

Data Center Consultant

Let’s keep in touch!

3 Comments

  1. Gary

    As always, some will hang onto the old technology as log as they can. Make no mistake, clean agent (electrical contact cleaner) immersive cooling is coming and when it does it will take no prisoners.

    Reply
  2. Phil Hughes

    There are over 20 companies ofering a plethora of liquid cooling solutions. Most of them fail on the simple questions: Easy to service? Reliable? High Density? Cost Effective?
    As the only vendor of a fully direct cooling system I should point out that we use off the shelf motherboards, not custom (author, next time take better notes) .Our first deployment has accumulated about 2M server hours with zero failures and has a power and cooling overhead of 11% of baseboard power draw – about the same as the internal power supply and fans of a conventional server in the middle of a field (without the horse trough!).

    And we satidfy all the above criteria.

    Reply
  3. John Booth

    Horse Troughs? Clearly not looking hard enough Ian.
    May I refer you to the ICEOTOPE solution?, ICEOTOPE is a complete liquid cooled server solution, vertical blades contained in cabinet.
    The blades are immersed in a non conductive fire retardant solution which passes the heat onto the water cooling circuit and then out to a number of potential solutions, 1. radiators for office heating, 2. for other commercial solutions such as industrial grade heat for cleaning or textiles or as a pre heater in CHP systems.
    Now ICEOTOPE is the future!

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

Subscribe to the Upsite Blog

Follow Upsite

Archives

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.

Pin It on Pinterest