Even Detroit knows that autos get different efficiencies based on how & where they're driven... so the metric called "mileage" is actually measured & documented twice -- one for City, one for Highway. Data Centers need something akin to this as well.
Why? Because IT departments operate at greatly different levels; peak (maybe during the day) as well as off-peak (perhaps nights/weekends). Ideally, the data center should know how to adapt to these conditions: re-purposing "live" machines during peak hours; retiring and temporarily shutting-down idle servers during off-peak; removing power conditioning equipment when not needed; turning off specific CRAC units and chillers when not required (i.e. cold days and/or off-peak hours). We need an efficiency metric that indicates how data centers operate Dynamically.
Anyway, here's a quick survey course in what metrics I did find, and what I'd like to see:
The Green Grid on metrics:
- Data Center Infrastructure Efficiency,
DCiE = (IT equipment power)/(total facility power).
This is supposed to be a quick ratio showing how much power gets to servers, versus how much else is consumed by power distribution, cooling, lighting, etc. Driving this ratio up means you have less overhead wasting Watts. This wouldn't be too bad a metric if it was used and monitored 24x7, i.e. peak and off-peak.
- Power Usage Effectiveness,
PUE = 1/DCiE (just the boring reciprocal)
- Data Center Productivity, (a metric to be adopted in the future)
DCP = (useful computing work)/(total facility power)
In theory, this is a great metric: It's like saying "how many MIPS per Watt" can you produce? (BTW, the human brain, the most powerful of all computers, consumes somethling like 25W). Anyway, DCP is a contentious metric... because each computing vendor wants to define "useful computing work" with their own (preferential) way of computing. Frankly, this is most useful to measure efficiency at the server level.
In an excellent paper, the Uptime Institute discusses these in "Four Metrics Define Data Center "Greenness":
- Site Infrastructure Energy Efficiency Ratio:
SI-EER which the Institute is currently working to re-cast in more intuitive and technically accurate terms. I suspect this is much like the Green Grid's DCiE, above
- Site Infrastructure Power Overhead Multiplier Which is essentially the same metric as the Green Grid's PUE, above
SI-POM = (data center power consumption at the meter)/(total power consumption at the plug for IT equipment)
- Deployed Hardware Utilization Ratio:
DH-UR = (qty of servers running live applications)/(total number of servers actually deployed)
This speaks to the real-time utilization of hardware, and IMHO is one of the best metrics for a dynamic data center. It points to how many deployed servers are actually doing work, vs. those that are sitting "comatose". A very promising metric if it's used in conjunction with equipment that constantly optimizes how many servers are "on", and shuts down idled servers, constantly minimizing this metric.
- Deployed Hardware Utilization Efficiency
DH-UE = (minimum qty of servers needed to handle peak load)/(total number of servers deployed)
This is another great metric - it speaks to the capital efficiency of hardware - how many need to be provisioned and on the floor, relative to how many are being used actively.
- A DH-UR that changes dynamically, constantly being minimized. This implies that only required servers are actually powered-up and active.
- An SI-POM that was always driven toward a constant ratio, regardless of compute demand. Which implies that, as compute demand falls, servers are retired and other support equipment (power handling, cooling) also shuts down, keeping the efficiency ratio balanced.