The National Security Agency's new data center in Utah was built for a 65 megawatt load, making it one of the world's largest data centers. But it has had a rough start.
The $1.53 billion data center was built on a "very aggressive" schedule, according to the government, with groundbreaking in early 2011 and completion scheduled by the end of this year. But a report of "meltdowns," "flashes of lightning" and damaged equipment caused by arc faults raised questions about the timetable.
There are many potential points of failure in this approximately 1 million-square-foot facility in Bluffdale, Utah, 23 miles south of Salt Lake City. The NSA says 100,000 square feet of the center is mission critical raised floor space. The balance of the mall-sized complex is for other support and administrative functions.
The amount of wiring and compute equipment used is hard to imagine. Details found in government records speak to its complexity. For instance, among the systems installed are 60 diesel emergency standby generators, each capable of producing 3,000 kW, according to a Utah environmental quality report.
The Wall Street Journal first reported the electrical problems in documents it obtained. The Salt Lake Tribune subsequently obtained an NSA document sent to a congressional oversight committee that was less alarming. It said that said none of the compute equipment was damaged and electrical problems were localized to breaker boxes. An NSA spokesman confirmed the accuracy of the Tribune's report about the letter, but the agency isn't releasing additional information.
So what went wrong?
"No one is immune from a lack of discipline," said Hudson Denney, a founder and principal of Net3 Technology, a provider of cloud-based managed services and data center platforms. "Electricity could care less if it's a billion-dollar government facility, a manufacturing plant, or a regional data center."
While Denney can't speak to NSA's problems directly, he is not surprised by them.
"We can't begin to recall all the times we have seen a generator turn on without a transfer switch throwing, which ends up with a down data center and a lot of wasted diesel," Denney said. "We have had engineers work on server issues for hours just to find out a $2 cord went bad or the jack wasn't punched down correctly by a contractor.
"Sometimes people just get it wrong," Denney continued. "The redundant power isn't [there], the screws on the breaker panel are not tight, and the fuel in the generator wasn't cycled properly."
Similarly, Andy Pace, the chief operating officer of SingleHop, a dedicated server and cloud hosting company, said the sheer scale of the NSA data center "guarantees a complex power system.
"A great deal of the hardware in the facility is custom-built and that will also add a layer of complexity," Pace said. Problems can span from the human side, including poor quality assurance and oversight, to issues such as lack of redundancy, hardware faults and bad equipment. "What is most important with a problem like this is not the fact that it happened, but how quickly it can be solved," he said.
An arc fault happens when there is a broken pathway between two conductive objects, Pace said. In most cases, the air becomes the conductor bridging the gap between these two objects, which causes an arc. Eventually, it can begin to burn and melt anything, "and create quite the explosion."