Skip the navigation

Software failure cited in August blackout investigation

A malfunctioning alarm system may have played a big role in the outage

By Dan Verton
November 20, 2003 12:00 PM ET

Computerworld - The task force responsible for investigating the cause of the Aug. 14 blackout that crippled most of the Northeast corridor of the U.S. and parts of Canada concluded that a software failure at FirstEnergy Corp. "may have contributed significantly" to the outage.
The Interim Report of the U.S.-Canada Power System Outage Task Force, released today, highlights the failure of various IT systems that thwarted utility workers' ability to contain the blackout before it cascaded out of control, and found no evidence that malicious insiders or external saboteurs were responsible for the cascading power outage.
According to the task force, FirstEnergy 's Alarm and Event Processing Routine (AEPR), a key software program that gives operators visual and audible indications of events occurring on their portion of the grid, began to malfunction. As a result, "key personnel may not have been aware of the need to take preventive measures at critical times, because an alarm system was malfunctioning."
In addition, "some companies appear to have had only a limited understanding of the status of the electric systems outside their immediate control," the task force report concluded. "This may have been, in part, the result of a failure to use modern dynamic mapping and data sharing systems."
Besides the alarm software failure, the task force found that Internet links to Supervisory Control and Data Acquisition (SCADA) software weren't properly secure and some operators lacked a system to view the status of electric systems outside their immediate control.
The task force also provided a "cyber timeline" listing significant electronic control events that contributed to the rolling blackout. The first major event occurred at 12:40 p.m. EDT, when an engineer from the Midwest Independent Transmission System Operator disabled an automatic periodic trigger on software that allows the utility to determine the real-time status of the power system for its region. That action was needed to conduct a manual check of the network, the report states. However, the engineer later went to lunch and forgot to re-engage the automatic trigger.
By 2:40 p.m. EDT, the AEPR software began to malfunction, although FirstEnergy engineers weren't aware of the problem at the time. One minute later, FirstEnergy's AEPR server failed and switched over automatically to the backup server. Engineers, however, remained unaware of any other problems with the software. Then, at 2:54 p.m., the backup server failed.

At 3:05 p.m., when the first power-line failure occurred at FirstEnergy, system operators did not receive alarm notifications because of the malfunctioning AEPR software. That software continued to malfunction until 3:42 p.m., when the lights at FirstEnergy's control facility flickered and alerted engineers to the larger problem. It was only then that an operator noticed the problem with the AEPR software.
In a statement released Wednesday, FirstEnergy President and Chief Operating Officer Anthony J. Alexander said the company remains "convinced" that the blackout cannot be traced to any one utility system.
"We recognize that our computer system experienced problems that day," said Alexander. "After an extensive analysis, we submitted a report to the Task Force that identified a previously undetected flaw in vendor software that resulted in the loss of an alarm function, affecting our operators' understanding of events on our system."
However, "by focusing its analysis on a few selected events, the conclusions the Task Force reached don't address the complexity and magnitude of operations on the interconnected grid," Alexander said.
The fragile nature of the power grid also raised questions about the overall cybersecurity of the electric power grid and its susceptibility to potential deliberate disruption by terrorist organizations. While there was never evidence the blackout was caused by terrorists or hackers, the task force acknowledged the threat of cyber-induced disruptions.
Although al-Qaeda and other terrorist organizations claimed responsibility publicly for the Aug. 14 blackout, FBI counterterrorism officials told the task force and Congress that there is no evidence to support such claims.
Of particular concern to the task force, however, is the existence of direct and remote links between corporate networks used at utilities and the real-time SCADA systems used to manage the power grid. Until now, the electric industry has refused to publicly acknowledge these linkages and the vulnerability they pose. But the task force report puts SCADA system security at the center of the industry's most pressing security challenges.
"The existence of both internal and external links from SCADA systems to other systems introduced vulnerabilities," the report said. It stopped short, however, at assigning blame for the blackout to a series of viruses and worms that rampaged across the Internet prior to and during the blackout.
"At this time ... preliminary analysis of information derived from interviews with operators provides no evidence indicating exploitation of these vulnerabilities before or during the outage," the report said.
The Department of Homeland Security is currently working with the electric industry and private-sector IT companies to develop IT intrusion-detection systems that are capable of operating in the real-time environment of SCADA systems. For now, many critical control commands are communicated in clear text with no encryption protection.




Read more about Business Continuity in Computerworld's Business Continuity Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Business Continuity White Papers
An Interactive Guide: Bring Your Own Device
BYOD presents significant security and management challenges to IT departments who want to take advantage of the trend, but still protect corporate assets....
Malware Security Report: Protecting Your Business, Customers, and the Bottom Line
Protect your business and customers by understanding the threat from malware and how it can impact your online business. This paper highlights how...
Security Predictions for 2012
With all of the crazy 2011 security breaches, exploits and notorious hacks, what can we expect for 2012? Last year's Websense Security Labs...
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
All Business Continuity White Papers
Business Continuity Webcasts
Data Protection and Information Governance
Today, legal hold and information governance are increasingly becoming drivers for data protection. However, few organizations knows what information they have, where to...
Data Protection and Disaster Recovery with iSCSI and VMware
Get this on demand webcast now
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
All Business Continuity Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs