Software failure cited in August blackout investigation

A malfunctioning alarm system may have played a big role in the outage

The task force responsible for investigating the cause of the Aug. 14 blackout that crippled most of the Northeast corridor of the U.S. and parts of Canada concluded that a software failure at FirstEnergy Corp. "may have contributed significantly" to the outage.

The Interim Report of the U.S.-Canada Power System Outage Task Force, released today, highlights the failure of various IT systems that thwarted utility workers' ability to contain the blackout before it cascaded out of control, and found no evidence that malicious insiders or external saboteurs were responsible for the cascading power outage.

According to the task force, FirstEnergy 's Alarm and Event Processing Routine (AEPR), a key software program that gives operators visual and audible indications of events occurring on their portion of the grid, began to malfunction. As a result, "key personnel may not have been aware of the need to take preventive measures at critical times, because an alarm system was malfunctioning."

In addition, "some companies appear to have had only a limited understanding of the status of the electric systems outside their immediate control," the task force report concluded. "This may have been, in part, the result of a failure to use modern dynamic mapping and data sharing systems."

Besides the alarm software failure, the task force found that Internet links to Supervisory Control and Data Acquisition (SCADA) software weren't properly secure and some operators lacked a system to view the status of electric systems outside their immediate control.

The task force also provided a "cyber timeline" listing significant electronic control events that contributed to the rolling blackout. The first major event occurred at 12:40 p.m. EDT, when an engineer from the Midwest Independent Transmission System Operator disabled an automatic periodic trigger on software that allows the utility to determine the real-time status of the power system for its region. That action was needed to conduct a manual check of the network, the report states. However, the engineer later went to lunch and forgot to re-engage the automatic trigger.

By 2:40 p.m. EDT, the AEPR software began to malfunction, although FirstEnergy engineers weren't aware of the problem at the time. One minute later, FirstEnergy's AEPR server failed and switched over automatically to the backup server. Engineers, however, remained unaware of any other problems with the software. Then, at 2:54 p.m., the backup server failed.

At 3:05 p.m., when the first power-line failure occurred at FirstEnergy, system operators did not receive alarm notifications because of the malfunctioning AEPR software. That software continued to malfunction until 3:42 p.m., when the lights at FirstEnergy's control facility flickered and alerted engineers to the larger problem. It was only then that an operator noticed the problem with the AEPR software.

In a statement released Wednesday, FirstEnergy President and Chief Operating Officer Anthony J. Alexander said the company remains "convinced" that the blackout cannot be traced to any one utility system.

"We recognize that our computer system experienced problems that day," said Alexander. "After an extensive analysis, we submitted a report to the Task Force that identified a previously undetected flaw in vendor software that resulted in the loss of an alarm function, affecting our operators' understanding of events on our system."

However, "by focusing its analysis on a few selected events, the conclusions the Task Force reached don't address the complexity and magnitude of operations on the interconnected grid," Alexander said.

The fragile nature of the power grid also raised questions about the overall cybersecurity of the electric power grid and its susceptibility to potential deliberate disruption by terrorist organizations. While there was never evidence the blackout was caused by terrorists or hackers, the task force acknowledged the threat of cyber-induced disruptions.

Although al-Qaeda and other terrorist organizations claimed responsibility publicly for the Aug. 14 blackout, FBI counterterrorism officials told the task force and Congress that there is no evidence to support such claims.

Of particular concern to the task force, however, is the existence of direct and remote links between corporate networks used at utilities and the real-time SCADA systems used to manage the power grid. Until now, the electric industry has refused to publicly acknowledge these linkages and the vulnerability they pose. But the task force report puts SCADA system security at the center of the industry's most pressing security challenges.

"The existence of both internal and external links from SCADA systems to other systems introduced vulnerabilities," the report said. It stopped short, however, at assigning blame for the blackout to a series of viruses and worms that rampaged across the Internet prior to and during the blackout.

"At this time ... preliminary analysis of information derived from interviews with operators provides no evidence indicating exploitation of these vulnerabilities before or during the outage," the report said.

The Department of Homeland Security is currently working with the electric industry and private-sector IT companies to develop IT intrusion-detection systems that are capable of operating in the real-time environment of SCADA systems. For now, many critical control commands are communicated in clear text with no encryption protection.

6 tips for scaling up team collaboration tools
  
Shop Tech Products at Amazon