How to improve your data center operations

Inspect your 'emergency power off' situation

This is the first in a series of stories that will focus on data center improvements. Some of the ideas will increase capacity, and others will increase redundancy; the last group will improve the overall efficiency and reliability of the electrical and mechanical infrastructure.

Each of these suggestions has been installed and tested in live environments.

Data center staffers are challenged when processing capacity is increased within existing facilities. While the reliability of hardware, software and networks has been improving, electrical and mechanical infrastructure improvements lag behind. Forensic evaluations of data center failures demonstrate that operator errors, electrical and mechanical single points of failure, design problems, and construction defects are the leading causes for data center disruptions.

This situation is bound to be made worse as more data centers are relocated or expanded over the next five years. Rakesh Kumar, an analyst at Gartner Inc., said that more than 70% of the Global 1,000 organizations will have to modify their data center facilities significantly during the next five years.

"These legacy data centers typically were built to a design specification of about 100 to 150 watts per square foot. Current design needs are about 300 to 400 watts per square foot. By 2011, this could rise to more than 600 watts per square foot," Kumar said. "The implication is that most current data centers will be unable to host the next generation of high-density equipment, so CIOs will have to refurbish their established sites, build new ones or look for alternatives, such as using a hosting provider."

Unfortunately, the compaction of space required by IT hardware has resulted in unprecedented increases in power and cooling needs, outstripping facility infrastructure, design standards and space allocations. "Back of the house" spaces for power and cooling to support high-density computing are, in many data centers, larger than the computer area itself. Electrical and mechanical areas can be 400% larger than the raised-floor computing space in 250-watt-per-square-foot environments.

At the same time, facility infrastructure support is shortchanged because data center infrastructure represents such a small portion of the real estate market and because the finances relative to the revenue are small. Data centers represent less than one tenth of one percent (0.1%) of all real estate construction in the U.S. In addition, these are lightly occupied or unoccupied buildings. Some are actually "lights-out" facilities that are fully automated, without any occupants.

Also, in a data center environment, annual facility costs, including infrastructure depreciation, is as little as 0.5% (one-half of one percent) of the IT budget. In a large company, the costs to operate and maintain the electrical and mechanical infrastructure can be less than one-thousandth of one percent of annual revenue, less than a rounding error. These small costs don't generally get much high-level attention.

Furthermore, data centers may be small areas located within a much larger building, camouflaging the true operational risks and utility expenses. For example, an international pharmaceutical company recently migrated a 1,000-square-foot, high-density server room into its 50,000-square-foot office building. The utility bill for the entire building doubled -- and has remained at that level for the past nine months.

EPO problems

This leads us to the first low-cost, low-risk, high-benefit opportunity for improving the reliability of your data center's critical power system: Inspect your emergency power off (EPO) switch.

These innocuous buttons are located at the exits of most data centers. Once pushed, critical power is shut down and can be reactivated only manually, typically by an electrician who knows the system. Disruptive events due to EPO include abnormal incidents that have shut down emergency 911 access and that have interrupted international trading, corporate accounting, pharmaceutical research and air traffic control.

Virtually every industry that relies on central data center functions has experienced EPO disruptions.

While some of the EPO disruptions were caused by faulty wiring, under-floor cable pulls snagging the EPO conduit, water leaks and poor maintenance, the majority of data centers shut down by EPO activation were caused by a human pushing an EPO button in error. In many cases, the activation was the result of an occupant pushing buttons near the exit thinking they were deactivating magnetic security locks.

In at least one recent case, the EPO disruption was done on purpose: A systems administrator shut down a data center that controls the California electrical grid.

Hundreds of incidents across the U.S. are reported annually in data centers. These are the same facilities where millions of dollars were originally invested to achieve electrical fault-tolerance and continuous availability. Every IT, network and telecommunications component powered in a raised-floor area is at risk.

Still, the EPO button is required by Articles 645.10 and 645.11 of the National Electrical Code. These rules mandate that computer rooms have an EPO system at each exit to disable power under the raised floor as well as to disable power to air conditioning that supplies cooling to the raised floor. By code, the disconnection mechanism may be a single button or two adjacent buttons -- one for power, the other for cooling.

But all too often, these EPO buttons are placed next to the many other exit-mounted devices, including fire-suppression release/abort buttons, light switches, security card readers, fire extinguishers, fire alarm panels, telephones, security intercoms and exit buttons.

This confusing conglomeration next to the exit door can easily allow data center occupants to select the EPO when they were simply trying to turn on the lights or call security.

Even momentary pushes on the EPO button will shut down the data center and require maintenance staffers to reset all tripped electrical devices. Electrical reset could take up to 30 minutes -- this in an environment where a fraction of a second can cause irreparable damage to hardware, databases and corporate profits.

It is probable that this single point of failure is one of the leading causes of critical power loss in the U.S. These electrical disruptions occur with the same regularity as utility disruptions, engine-generator failures and nuisance circuit-breaker trips, but they are generally not seen as failures. Because the button is pushed on purpose, whether by mistake or not, these are considered accidents but not the same as utility disruptions.

EPO button

But there is a way of making the EPO button less hazardous to your data center's health. There is a protocol that has been tried for more than a decade in dozens of data centers around the country. It could be implemented in your data center within a few hours and a few hundred dollars per exit -- truly a small price to pay to eliminate a common source of risk in a modern data center.

In the photo above, note that the EPO is clearly marked as the "Emergency Power Off Button." The intent is to distinguish it from the other devices at the doorway of the data center. Note that the cover over the EPO has a keyed lock, but the key is already inserted. Opening the case will need to be very intentional -- but if a real emergency existed, the lock would not be an impediment.

Under the cover is a battery-operated microswitch that sounds an audible 90-decibel (piercingly loud) alarm, while instantly alerting security through a second microswitch that the cover has been lifted. A phone is a few feet away from the EPO switch.

Additional EPO requirements include having a system that can be serviced and maintained fail-safe. That means it can be maintained while critical load is being powered. Many clients are terrified of changing a burned-out light bulb on their EPOs for fear of accidentally shutting down their data centers.

Some other designs we have developed require the closure of two latching push buttons that need a key to release. Others have the alarm switch in the EPO cover simultaneously cause a video camera to rotate and film the EPO because disgruntled employees have perpetrated some malicious EPO activations.

As a final consideration, the label can be expanded to read "EPO (emergency power off) button. This will shut down all equipment in this room. Use for life saving emergency only." It may be a good idea to have the sign in an alternative language that is used by non-English-speaking occupants.

More than 30 years ago, code officials determined the need for EPOs because power installed under a raised floor could start a concealed fire. Also, since there are so many circuit breakers in a data center, it is difficult to determine a source for disconnecting if someone is being electrocuted. Modern components that mitigate the need for EPO include fire-/smoke-detection systems under the raised floor and "ground fault (GFI)" settings in circuit breakers.

Actual cases where EPO activation has saved lives are nonexistent. The Canadians are intentionally attempting to remove this requirement from their codes. Unfortunately, not unlike many examples in building codes, once approved, they are very difficult to exorcise from the code books.

Edward C. Koplin, professional engineer and certified energy manager, is a principal of consultancy X-nth. He has been an advisor to the Site Uptime Institute and has evaluated, designed and commissioned over 3 million square feet of Fortune 500 data centers. He can be reached at ekoplin@x-nth.com.

Copyright © 2008 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon