Software failure cited in August blackout investigation
A malfunctioning alarm system may have played a big role in the outage
Computerworld - The task force responsible for investigating the cause of the Aug. 14 blackout that crippled most of the Northeast corridor of the U.S. and parts of Canada concluded that a software failure at FirstEnergy Corp. "may have contributed significantly" to the outage.
The Interim Report of the U.S.-Canada Power System Outage Task Force, released today, highlights the failure of various IT systems that thwarted utility workers' ability to contain the blackout before it cascaded out of control, and found no evidence that malicious insiders or external saboteurs were responsible for the cascading power outage.
According to the task force, FirstEnergy 's Alarm and Event Processing Routine (AEPR), a key software program that gives operators visual and audible indications of events occurring on their portion of the grid, began to malfunction. As a result, "key personnel may not have been aware of the need to take preventive measures at critical times, because an alarm system was malfunctioning."
In addition, "some companies appear to have had only a limited understanding of the status of the electric systems outside their immediate control," the task force report concluded. "This may have been, in part, the result of a failure to use modern dynamic mapping and data sharing systems."
Besides the alarm software failure, the task force found that Internet links to Supervisory Control and Data Acquisition (SCADA) software weren't properly secure and some operators lacked a system to view the status of electric systems outside their immediate control.
The task force also provided a "cyber timeline" listing significant electronic control events that contributed to the rolling blackout. The first major event occurred at 12:40 p.m. EDT, when an engineer from the Midwest Independent Transmission System Operator disabled an automatic periodic trigger on software that allows the utility to determine the real-time status of the power system for its region. That action was needed to conduct a manual check of the network, the report states. However, the engineer later went to lunch and forgot to re-engage the automatic trigger.
By 2:40 p.m. EDT, the AEPR software began to malfunction, although FirstEnergy engineers weren't aware of the problem at the time. One minute later, FirstEnergy's AEPR server failed and switched over automatically to the backup server. Engineers, however, remained unaware of any other problems with the software. Then, at 2:54 p.m., the backup server failed.
At 3:05 p.m., when the first power-line failure occurred at FirstEnergy, system operators did not receive alarm notifications because of the malfunctioning AEPR software. That software continued to malfunction until 3:42 p.m., when the lights at FirstEnergy's control facility flickered and alerted engineers to the larger problem. It was only then that an operator noticed the problem with the AEPR software.
In a statement released Wednesday, FirstEnergy President and Chief Operating Officer Anthony J. Alexander said the company remains "convinced" that the blackout cannot be traced to any one utility system.
"We recognize that our computer system experienced problems that day," said Alexander. "After an extensive analysis, we submitted a report to the Task Force that identified a previously undetected flaw in vendor software that resulted in the loss of an alarm function, affecting our operators' understanding of events on our system."
However, "by focusing its analysis on a few selected events, the conclusions the Task Force reached don't address the complexity and magnitude of operations on the interconnected grid," Alexander said.
The fragile nature of the power grid also raised questions about the overall cybersecurity of the electric power grid and its susceptibility to potential deliberate disruption by terrorist organizations. While there was never evidence the blackout was caused by terrorists or hackers, the task force acknowledged the threat of cyber-induced disruptions.
Although al-Qaeda and other terrorist organizations claimed responsibility publicly for the Aug. 14 blackout, FBI counterterrorism officials told the task force and Congress that there is no evidence to support such claims.
Of particular concern to the task force, however, is the existence of direct and remote links between corporate networks used at utilities and the real-time SCADA systems used to manage the power grid. Until now, the electric industry has refused to publicly acknowledge these linkages and the vulnerability they pose. But the task force report puts SCADA system security at the center of the industry's most pressing security challenges.
"The existence of both internal and external links from SCADA systems to other systems introduced vulnerabilities," the report said. It stopped short, however, at assigning blame for the blackout to a series of viruses and worms that rampaged across the Internet prior to and during the blackout.
"At this time ... preliminary analysis of information derived from interviews with operators provides no evidence indicating exploitation of these vulnerabilities before or during the outage," the report said.
The Department of Homeland Security is currently working with the electric industry and private-sector IT companies to develop IT intrusion-detection systems that are capable of operating in the real-time environment of SCADA systems. For now, many critical control commands are communicated in clear text with no encryption protection.
Read more about Business Continuity in Computerworld's Business Continuity Topic Center.



- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- An Interactive Guide: Bring Your Own Device
- BYOD presents significant security and management challenges to IT departments who want to take advantage of the trend, but still protect corporate assets....
- Malware Security Report: Protecting Your Business, Customers, and the Bottom Line
- Protect your business and customers by understanding the threat from malware and how it can impact your online business. This paper highlights how...
- Security Predictions for 2012
- With all of the crazy 2011 security breaches, exploits and notorious hacks, what can we expect for 2012? Last year's Websense Security Labs...
- Overcome Top 7 Admin Challenges of Active Directory
- As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
- Insiders Can Ruin Your Company. Take Action.
- Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in... All Business Continuity White Papers
- Data Protection and Information Governance
- Today, legal hold and information governance are increasingly becoming drivers for data protection. However, few organizations knows what information they have, where to...
- Data Protection and Disaster Recovery with iSCSI and VMware
- Get this on demand webcast now
- Optimizing Networks for the Cloud
- Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
- Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
- Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
- Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
- Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and... All Business Continuity Webcasts