What Is the Difference Between ASHRAE’s Recommended and Allowable Data Center Environmental Limits? – Part 317 min read
This is part 3 of a three part series on ASHRAE’s recommended vs allowable data center environmental limits. To read part 1, click here, and to read part 2, click here.
I have suspected there remains some ongoing desire for clarity regarding the difference between the ASHRAE TC9.9 recommended environmental envelope and allowable envelope for data centers and a perplexed hope for some direction on what to actually do with this information. I continue to entertain questions on the recommended and allowable temperatures and I continue to read articles and hear presentations that are cautiously descriptive and woefully short of actionable steps. My suspicions have been confirmed by the response to my two most recent pieces on this subject – readership and engagement have been more than double the average over the past several years. My recommendation remains straightforward and actionable: With employment of effective airflow management and utilization of an appropriate economizer solution, I recommend operating within the allowable temperature envelope with supply temperatures modulated by Mother Nature with either no air conditioned cooling or a greatly reduced air conditioning capacity.
In part 2 of this series I explored various scenarios for data centers with Class A2 servers (10-35˚C allowable) in fourteen different geographic and climatic locations (Atlanta, Amsterdam, Chicago, Dallas, Denver, Frankfort, Hong Kong, London, Omaha, Phoenix, Reston, San Jose, Sydney and Wenatchee). I showed that in every single location a data center could be operated without any air conditioning while maintaining all server inlet temperatures within the ASHRAE Class A2 temperature limits with at least one of the analyzed economizer architectures, and most cities accommodated two or more economizer types. In addition, I used the ASHRAE “X” factor methodology to show that server reliability would actually be improved when compared to failure rates expected with constant 72˚F server inlet temperature in every location except Hong Kong. I suggested that a reasonable business case could probably be made for accepting the slight increase in server failure rates in Hong Kong in exchange for the resultant capital and operating savings from eliminating air conditioning in conjunction with indirect evaporative cooling or reducing the air conditioning capacity by 75% and reducing that lower capacity’s runtime by 68% up to 98%, based on economizer style, as compared to AC capacity and run times required to maintain the ASHRAE recommended envelope of 64.4˚ – 80.6˚F, with economizers in Hong Kong.
So what possible obstacle could there be to adopting that recommendation? Perhaps heat affects computer performance. After all, it is common knowledge now that gamers, for example, routinely over-clock their processors and get away with it by cooling those microprocessors with some form of direct contact liquid cooling. In addition, this practice is now creeping into the HPC realm. However, the boost in performance made possible with direct contact liquid cooling does not have an opposing corollary of reduced performance produced by temperatures above some baseline server inlet temperature – up to a point. And the point is that “point” is higher than we would likely see within the allowable temperature envelope. Today’s CPU’s are designed to operate up to 95˚C or even 100˚C and testing with Linpack to track floating point operations has revealed that Intel CPUs, for example, can operate at 100˚C for up to 50% of the time before experiencing a slight drop-off in operating frequency and resultant FLOP transaction rate.1 Therefore, the trick is to keep the data center operating at some point that assures a server inlet temperature is low enough to keep the CPU operating below that temperature where performance is impacted.
Results of experimental studies indicate that maintaining server inlet temperatures within the ASHRAE allowable environmental ranges will prevent CPUs from reaching those thresholds where performance is affected. One particularly well-defined and controlled study was conducted at IBM and reported at an American Society of Mechanical Engineers technical conference. Their focus was specifically on server performance within the Class A3 envelope (41-104˚F), and more specifically at the upper range of that envelope. They tested 1U, 2U and blade server packages with different power supplies and they selected workload test packages to simulate both high performance computing and virtualization cloud typical workloads. They evaluated over 70 different CPUs and selected test samples from the best and worst for power leakage to determine the effect of that variable on results at these conditions. They baselined each piece of equipment and associated workload test at 77˚F server inlet temperature and then re-tested at 95˚F (Class A2 upper limit) and at 104˚F (Class A3 upper limit). The results, summarized in Table 1, wherein the 95˚F and 104˚F columns are the ratio of operations performed versus the 77˚F baseline, clearly indicate there is no degradation of performance at these higher temperatures. The only test that fell outside the +/- 1% tolerance of the test workloads and data acquisition was the worst power leakage blade system running Linpack in the intensified Turbo Boost mode, and that only showed a 2% performance degradation, or 1% beyond the tolerance margin of error.
SERVER PERFORMANCE AT DIFFERENT INLET TEMPERATURES
CONDITIONS TESTED | Server Inlet Temperature | ||||||
System | Variation | Application | Leakage | Exercise | 77˚F | 95˚F | 104˚F |
1U-a | 130W | HPC | FLOPs w/turbo | 1.00 | 1.00 | 1.00 | |
1U-a | 115W | HPC | FLOPs w/turbo | 1.00 | 1.00 | 1.00 | |
2U | 115W | HPC | Best | FLOPs w/turbo | 1.00 | 1.00 | 1.00 |
2U | 115W | Cloud | Best | Java | 1.00 | 1.00 | 0.99 |
2U | 115W | HPC | Best | Integers | 1.00 | 1.00 | 1.00 |
2U | 115W | HPC | Best | FLOPS (SPEC) | 1.00 | 1.00 | 1.00 |
2U | 130W | HPC | FLOPs w/turbo | 1.00 | 1.00 | 1.00 | |
Blades | 130W | HPC | Best | FLOPs w/turbo | 1.00 | 0.99 | 0.99 |
Blades | 130W | Cloud | Best | Java | 1.00 | 0.99 | 0.99 |
Blades | 130W | HPC | Best | Integers | 1.00 | 1.00 | 1.00 |
Blades | 130W | HPC | Best | FLOPS (SPEC) | 1.00 | 1.00 | 1.00 |
Blades | 130W | HPC | Worst | FLOPs w/turbo | 1.00 | 0.99 | 0.98 |
Blades | 130W | HPC | Worst | FLOPs w/o turbo | 1.00 | 1.00 | 1.00 |
Blades | 130W | Cloud | Worst | Java | 1.00 | 0.99 | 0.99 |
Blades | 130W | HPC | Worst | Integers | 1.00 | 1.00 | 0.99 |
Blades | 130W | HPC | Worst | FLOPS (SPEC) | 1.00 | 1.00 | 0.99 |
1U-b | 115W | HPC | Worst | FLOPs w/turbo | 1.00 | 1.00 | 1.00 |
1U-b | 115W | Cloud | Worst | Java | 1.00 | 1.00 | 1.01 |
1U-b | 115W | HPC | Worst | Integers | 1.00 | 1.00 | 1.00 |
1U-b | 115W | HPC | Worst | FLOPS (SPEC) | 1.00 | 1.00 | 1.00 |
AVERAGE | 1.00 | 1.00 | 1.00 |
Table 1: Test Results on Server Operations Performance at Higher Temperatures2
Researchers at the University of Toronto conducted experiments on a much smaller sample but tested a far wider range of variables and generated more meaningful data by driving server inlet temperatures up to 60˚C (140˚F), in 0.1˚C increments. At 140˚F they found disk drives typically suffered 5-10% declines in throughput, with one drive suffering a 30% decline. More importantly, they did not see any statistically noticeable performance degradation below 104˚F inlet temperature and one drive made it up to 131˚F before suffering any throughput loss versus the baseline benchmark. In all their various tests, they did not see any throttling down of performance on any of the benchmarks for CPUs and memory up to 131˚F. For a complete list of tested variables, my readers can check my piece, “Airflow Management Considerations for a New Data Center: Part 2: Server Performance versus Inlet Temperature,” May 17, 2017 on the Upsite Technologies Blog page. For specific performance results for particular drives, memory modules and CPUs, I invite my readers to consult the original University of Toronto study.3
So, back to the original question on what is the difference between the ASHRAE recommended and allowable temperature envelopes for data centers. Besides the obvious and superficial difference in degrees, it is clear that the allowable envelope results in dramatic opex and capex savings on the data center mechanical plant, generally results in improved server reliability and does not negatively impact server performance. The recommended envelope, on the other hand, cannot be reasonably recommended due to its lack of any socially redeeming values. Finally, hearkening back to Part 2 of this series: Remember – the whole economic argument depends on effective airflow management that maintains a maximum 2˚F variation in the supply air side of the data center.
This is part 3 of a three part series on ASHRAE’s recommended vs allowable data center environmental limits. To read part 1, click here, and to read part 2, click here.
1 “Impact of Temperature on Intel CPU Performance,” https://www.pugetsystems.com/labs/articles, Matt Bach, October 2014, p.3.
2 Data derived from Table 3: Results in “Data Center Trends toward Higher Ambient Inlet Temperatures and the Impact on Server Performance,” Aparna Vallury and Jason Matteson, Proceedings of the ASME 2013 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems InterPACK2013 July 16-18, 2013, p.4.
3 “Temperature Management in Data Centers: Why Some (Might) Like It Hot,” Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder, Department of Computer Science, University of Toronto, 2012.
Airflow Management Awareness Month 2019
Did you miss this year’s live webinars? Watch them on-demand now!
Ian Seaton
Data Center Consultant
Let's keep in touch!
Airflow Management Awareness Month 2019
Did you miss this year’s live webinars? Watch them on-demand now!
0 Comments