Skip the navigation
)

Software failure cited in August blackout investigation

A malfunctioning alarm system may have played a big role in the outage

By Dan Verton
November 20, 2003 12:00 PM ET

Computerworld - The task force responsible for investigating the cause of the Aug. 14 blackout that crippled most of the Northeast corridor of the U.S. and parts of Canada concluded that a software failure at FirstEnergy Corp. "may have contributed significantly" to the outage.
The Interim Report of the U.S.-Canada Power System Outage Task Force, released today, highlights the failure of various IT systems that thwarted utility workers' ability to contain the blackout before it cascaded out of control, and found no evidence that malicious insiders or external saboteurs were responsible for the cascading power outage.
According to the task force, FirstEnergy 's Alarm and Event Processing Routine (AEPR), a key software program that gives operators visual and audible indications of events occurring on their portion of the grid, began to malfunction. As a result, "key personnel may not have been aware of the need to take preventive measures at critical times, because an alarm system was malfunctioning."
In addition, "some companies appear to have had only a limited understanding of the status of the electric systems outside their immediate control," the task force report concluded. "This may have been, in part, the result of a failure to use modern dynamic mapping and data sharing systems."
Besides the alarm software failure, the task force found that Internet links to Supervisory Control and Data Acquisition (SCADA) software weren't properly secure and some operators lacked a system to view the status of electric systems outside their immediate control.
The task force also provided a "cyber timeline" listing significant electronic control events that contributed to the rolling blackout. The first major event occurred at 12:40 p.m. EDT, when an engineer from the Midwest Independent Transmission System Operator disabled an automatic periodic trigger on software that allows the utility to determine the real-time status of the power system for its region. That action was needed to conduct a manual check of the network, the report states. However, the engineer later went to lunch and forgot to re-engage the automatic trigger.
By 2:40 p.m. EDT, the AEPR software began to malfunction, although FirstEnergy engineers weren't aware of the problem at the time. One minute later, FirstEnergy's AEPR server failed and switched over automatically to the backup server. Engineers, however, remained unaware of any other problems with the software. Then, at 2:54 p.m., the backup server failed.

At 3:05 p.m., when the first power-line failure occurred at FirstEnergy, system operators did not receive alarm notifications because of the malfunctioning AEPR software. That software continued to malfunction until 3:42 p.m., when the lights at FirstEnergy's control facility flickered and alerted engineers to the larger problem. It was only then that an operator noticed the problem with the AEPR software.
In a statement released Wednesday, FirstEnergy President and Chief Operating Officer Anthony J. Alexander said the company remains "convinced" that the blackout cannot be traced to any one utility system.
"We recognize that our computer system experienced problems that day," said Alexander. "After an extensive analysis, we submitted a report to the Task Force that identified a previously undetected flaw in vendor software that resulted in the loss of an alarm function, affecting our operators' understanding of events on our system."
However, "by focusing its analysis on a few selected events, the conclusions the Task Force reached don't address the complexity and magnitude of operations on the interconnected grid," Alexander said.
The fragile nature of the power grid also raised questions about the overall cybersecurity of the electric power grid and its susceptibility to potential deliberate disruption by terrorist organizations. While there was never evidence the blackout was caused by terrorists or hackers, the task force acknowledged the threat of cyber-induced disruptions.
Although al-Qaeda and other terrorist organizations claimed responsibility publicly for the Aug. 14 blackout, FBI counterterrorism officials told the task force and Congress that there is no evidence to support such claims.
Of particular concern to the task force, however, is the existence of direct and remote links between corporate networks used at utilities and the real-time SCADA systems used to manage the power grid. Until now, the electric industry has refused to publicly acknowledge these linkages and the vulnerability they pose. But the task force report puts SCADA system security at the center of the industry's most pressing security challenges.
"The existence of both internal and external links from SCADA systems to other systems introduced vulnerabilities," the report said. It stopped short, however, at assigning blame for the blackout to a series of viruses and worms that rampaged across the Internet prior to and during the blackout.
"At this time ... preliminary analysis of information derived from interviews with operators provides no evidence indicating exploitation of these vulnerabilities before or during the outage," the report said.
The Department of Homeland Security is currently working with the electric industry and private-sector IT companies to develop IT intrusion-detection systems that are capable of operating in the real-time environment of SCADA systems. For now, many critical control commands are communicated in clear text with no encryption protection.




Read more about Business Continuity in Computerworld's Business Continuity Topic Center.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Business Continuity White Papers
Practice Management: Double Billing Rate and Improve Patient Services
Would you like to double your billing rate and achieve faster payment for services?

Download this customer success story to see how One Health...
Mission Critical Data Explosion and Customer Case Study
Would you like to double your tier 1 storage capacity while simultaneously reducing your storage footprint?

Download this customer success story to see how...
Protecting Against Database Attacks and Insider Threats: Top 5 Scenarios
Read this new eBook to learn the top five scenarios and essential best practices for preventing database attacks and insider threats.
Database Activity Monitoring Is Evolving
Read the analyst report and learn how you can leverage the core capabilities of a DAP solution for better database security.
Establishing a Strategy for Database Security is No Longer Optional
The options for securing increasingly valuable databases are very broad and deep, and can be confusing. This research provides an overview of three...
All Business Continuity White Papers
Business Continuity Webcasts
Data Protection and Disaster Recovery with iSCSI and VMware
Get this on demand webcast now
Distributed Database Security with Real-time Monitoring
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
InfoSphere Warehouse Packs Demo
These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
Delivery Management -- Extending Lifecycle Management
Date: Wednesday, June 20, 2012, 1:00 PM EDT

Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
Leverage automation today to reduce IT complexity
Date: Tuesday, June 5, 2012, 2:00 PM EDT

Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
All Business Continuity Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs