Airflow Management Considerations for a New Data Center – Part 2: Server Performance versus Inlet Temperature18 min read

by | May 17, 2017 | Blog

This is a continuation of Airflow Management Considerations for a New Data Center – Part 1: Server Power vs Inlet Temperature]

How hot can you let your data center get before damage sets in? This is part 2 of a 7 part series on airflow management considerations for new data centers. In Part 1 I discussed server power versus inlet temperature. In this part, I will be talking about server performance versus inlet temperature.

Airflow management considerations will be those questions that will inform the degree to which we can take advantage of our excellent airflow management practices to drive down the operating cost of our data center. In my previous piece, part one of a seven-part series drawing on ASHRAE’s server metrics for determining a data center operating envelope, I explored the question of server power versus server inlet temperature, presenting a methodology for assessing the trade-off of mechanical plant energy savings versus increased server fan energy at higher temperatures. I suggested that for most applications, a data center could be allowed to encroach into much higher temperature ranges than many industry practitioners might have thought before server fan energy penalties reverse the savings trend. However, how much are we really saving if our temperature adventures are affecting how much work our computing equipment is doing for us? And that brings us to today’s subject and part two of this series: server performance versus server inlet temperature.
Servers today are much more thermally robust than recent legacy servers, particularly with the advent of Class A3 and Class A4 servers. In the recent past, as servers became equipped with variable speed fans and onboard thermal management, they contained the intelligence to respond to excessive temperatures by slowing down performance. Unfortunately, if energy savings features are disabled, as they frequently are, this self-preservation tactic will likely not function. Conversely, there are some server OEM’s who are essentially only delivering A3 servers (with safe operation up to 104˚inlet) and an A2 server, for all practical purposes, with allowable operation up to 95˚F, is going to be easier to source at flea markets than OEM sales channels. So if a new data center is being equipped with new IT equipment, this is a more straight-forward consideration. However, if legacy equipment will be moved into a new space, it will be important to contact the vendors to learn where performance temperature thresholds might be for different equipment. The evidence, as presented in the following paragraphs, suggests that operating up to the temperature levels that fall below server fan energy penalties will not result in inhibiting performance of today’s data center ITE.

It might be surprising to learn that today’s CPU’s are designed to operate up to 95˚C or even 100˚C and testing with Linpack to track floating point operations has revealed that Intel CPUs, for example, can operate at 100˚C (that’s 212˚F for the casual speed reader) for up to 50% of the time before experiencing a slight drop-off in operating frequency and resultant FLOP transaction rate.1 Of course, that is not a license to run our data centers at 212˚F, so the trick is to keep the data center operating at some point that assures a server inlet temperature is low enough to keep the CPU operating below that temperature where performance is impacted. Back when we were not sure what that threshold might be, the safe zone was inside the ASHRAE recommended envelope, which has slowly migrated to 64.4˚ to 80.6˚F, despite vendor documentation that allowed for much wider ranges. That trick does not have to be such a trick these days, as most servers come with sensors and outputs that tell us the CPU temperature. While that information is available, it is not necessarily useful for minute-to-minute management of the data center unless every piece of equipment in the data center comes from the same vendor and comes equipped with the same CPU temperature monitoring output format. Without that homogeneity, which describes most of our spaces, we need some guidance on where we can take the outside temperature without adversely affecting the inside temperature.

When ASHRAE TC9.9 added the new server classes and extended the allowable temperature envelope in 2011, the following year we saw a relative flurry of scientific and engineering activity in search of understanding the implications of these environmental guidelines on the equipment deployed in data centers. One particularly well-defined and controlled study was conducted at IBM and reported at an American Society of Mechanical Engineers technical conference. Their focus was specifically on server performance within the Class A3 envelope (41-104˚F), and more specifically at the upper range of that envelope. They tested 1U, 2U and blade server packages with different power supplies and they selected workload test packages to simulate both high-performance computing and virtualization cloud typical workloads. They evaluated over 70 different CPUs and selected test samples from the best and worst for power leakage to determine the effect of that variable on results at these conditions. They baselined each piece of equipment and associated workload test at 77˚F server inlet temperature and then re-tested at 95˚F (Class A2 upper limit) and at 104˚F (Class A3 upper limit). The results, summarized in Table 1, wherein the 95˚F and 104˚F columns are the ratios of operations performed versus the 77˚F baseline, clearly, indicate there is no degradation of performance at these higher temperatures. The only test that fell outside the +/- 1% tolerance of the test workloads and data acquisition was the worst power leakage blade system running Linpack in the intensified Turbo Boost mode, and that only showed a 2% performance degradation, or 1% beyond the tolerance margin of error.

SERVER PERFORMANCE AT DIFFERENT INLET TEMPERATURES

CONDITIONS TESTED

Server Inlet Temperature

SystemVariationApplicationLeakageExercise

77˚F

95˚F

104˚F

1U-a130WHPC FLOPs w/turbo

1.00

1.00

1.00

1U-a115WHPC FLOPs w/turbo

1.00

1.00

1.00

2U115WHPCBestFLOPs w/turbo

1.00

1.00

1.00

2U115WCloudBestJava

1.00

1.00

0.99

2U115WHPCBestIntegers

1.00

1.00

1.00

2U115WHPCBestFLOPS (SPEC)

1.00

1.00

1.00

2U130WHPC FLOPs w/turbo

1.00

1.00

1.00

Blades130WHPCBestFLOPs w/turbo

1.00

0.99

0.99

Blades130WCloudBestJava

1.00

0.99

0.99

Blades130WHPCBestIntegers

1.00

1.00

1.00

Blades130WHPCBestFLOPS (SPEC)

1.00

1.00

1.00

Blades130WHPCWorstFLOPs w/turbo

1.00

0.99

0.98

Blades130WHPCWorstFLOPs w/o turbo

1.00

1.00

1.00

Blades130WCloudWorstJava

1.00

0.99

0.99

Blades130WHPCWorstIntegers

1.00

1.00

0.99

Blades130WHPCWorstFLOPS (SPEC)

1.00

1.00

0.99

1U-b115WHPCWorstFLOPs w/turbo

1.00

1.00

1.00

1U-b115WCloudWorstJava

1.00

1.00

1.01

1U-b115WHPCWorstIntegers

1.00

1.00

1.00

1U-b115WHPCWorstFLOPS (SPEC)

1.00

1.00

1.00

AVERAGE

1.00

1.00

1.00

Within this same time frame, tests were conducted at the University of Toronto on just one server model, but with seven different hard drives from four major vendors and exercised the equipment with a far wider range of workloads and many more temperature settings. These tests took ambient temperatures much higher than the IBM tests so performance declines became easier to identify outside the range of normal statistical error. Their benchmark workloads included measuring time to access 4gb of memory, giga-updates per second of 8kb chunk memory random access, speed of integer operations, speed of floating point operations, speed of responding to random read/write requests, speed of handling highly sequential 65kb read/write requests, on-line transaction processing, I/O processing of on-line transactions, decision support database workloads, disk-bound database workloads, file system transactions, and HPC computational queries, all on recognized, industry-standard tools designed to either stress different parts of the system or to model a number of real world applications.3 Testing was conducted inside a thermal chamber in which temperatures could be controlled in 0.1˚C increments from -10˚ up to 60˚C (14-140˚F, and a bit wider range than we typically see in data centers today).

The University of Toronto researchers looked at both disk drive and CPU performance. For the disk drives, at an ambient temperature of 140˚F, they found throughput declines typically in the 5-10% range, with some up to 30%. More importantly, statistically noticeable throughput declines occurred at different ambient conditions for different disk drives: a couple observed at 104˚F and 113˚F and one not showing any reduction in throughput until 131˚F. If any of you are considering allowing your data centers to have “cold aisles” over 100˚F, and since all the tested equipment was rated at either 122˚F or 140˚F, I invite you to check out the original source for information on vendors and models.4 Otherwise, if you do not anticipate allowing cold aisles or supply temperatures to exceed 100˚F, then disk drive throughput will not be affected by the data center environmental envelope. As for CPU and memory performance, they did not see any throttling down of performance on any of the benchmarks up to 131˚F5.

Data from research projects conducted immediately after the release of the 2011 ASHRAE environmental guidelines update suggests strongly that computing performance will not be degraded by server inlet temperatures within the ranges previously identified as thresholds before server fan energy increases eat into mechanical plant savings. In fact, the performance temperatures generally far exceed the economic temperature thresholds. This performance temperature headroom, therefore, suggests that some op ex-savings might be reasonably sacrificed for the cap ex-avoidance of not having to build a mechanical plant at all. And, we’re still not done, there are also ITE cost considerations, reliability considerations, and other environmental considerations, which I will be hitting in subsequent posts.

Continues in Airflow Management Considerations for a New Data Center – Part 3: Server Cost vs Inlet Temperature

1. “Impact of Temperature on Intel CPU Performance,” https://www.pugetsystems.com/all_articles.php, Matt Bach, October 2014, p.3.
2. Data derived from Table 3: Results in “Data Center Trends toward Higher Ambient Inlet Temperatures and the Impact on Server Performance,” Aparna Vallury and Jason Matteson, Proceedings of the ASME 2013 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems InterPACK2013 July 16-18, 2013, p.4.
3. “Temperature Management in Data Centers: Why Some (Might) Like It Hot,” Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder, Department of Computer Science,
University of Toronto, 2012, pp.8-9.
4. Ibid., p.9.
5. Ibid., p.10.

Airflow Management Awareness Month 2019

Did you miss this year’s live webinars? Watch them on-demand now!

Ian Seaton

Ian Seaton

Data Center Consultant

Let's keep in touch!

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Subscribe to the Upsite Blog

Follow Upsite

Archives

Airflow Management Awareness Month 2019

Did you miss this year’s live webinars? Watch them on-demand now!

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.

Pin It on Pinterest