Ads by TechWords

See your link here
Receive the latest technology news and information.
Security
Disaster Recovery
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

FAA: Sun box disk failure caused NOTAM database crash

System issues preflight notices to pilots on airports, airspace and security issues

May 30, 2008 12:00 PM ET

Active Comments
Phobos says: ok, so it was a disk problem.... why mention Sun or that it was a Sun box?? totally irrelevant... maybe...
Anonymous says: Actually, the FAA chose to run a critical application on end-of-life hardware without the required redundancy. Therefore, it's the FAA's...


Computerworld - A disk failure in a Sun Microsystems Inc. server caused the Federal Aviation Administration's NOTAM database to crash for nearly 20 hours last week, according to the FAA.

The NOTAM (notice to airmen) system provides notices to airmen, or pilots, regarding airports, equipment and security issues. The system went down late May 22 and was back up at around 7 p.m. on May 23.

Because of the disk failure, information had to be delivered to pilots through local air traffic controllers and alternate systems, including a Web site set up to disseminate the most up-to-date information, said Barry Davis, manager of aeronautical information management for the FAA. However, flight safety was never a problem, the FAA said.

"What happened was the drive in an end-of-life Sun box failed in the middle of updating the information on the hard drive, so it screwed up the database," Davis said.

Davis said that was the beginning of the complications. Davis' team replaced the hardware and the drive on May 22, which got the system running again.

"We already had the equipment to replace [the box], we just hadn't done it yet, and that's why the hardware recovery was quite simple -- we just put the boxes in," Davis said.

But even then, the system was running slowly, or in a deteriorated mode, and it got so bad, Davis said, that his team decided to reopen the problem to see what was going on.

As the technicians were working to fix the database, they decided to go to the backup system. As they did that, they soon realized they had written the error over to the backup system and had corrupted that system as well, Davis said.

"So because we had already replaced the hardware and the drives, we just had to pull the latest information and extract it out of the [corrupted] database, then re-import it into the [new] database," Davis said. "Then we resynchronized all of the subsystems so everyone had the same database copy, and then we opened the gates up at 4:40 p.m. on Friday so that all of the information would come into the system."

Davis and his team spent the rest of that night monitoring the situation to make sure there were no other errors.

While the automated system was out, pilots and other affected organizations were able to get the latest information from a Web site set up for that purpose. Although everything was updated by 7 p.m. on Friday, Davis said the decision was made to keep the Web site up until midnight as a precaution.

Read more about disaster recovery in Computerworld's Disaster Recovery Knowledge Center.



Jump to comments

FAA

Additional Resources

EFD vs. HDD - What You Need to Know
WHITE PAPER
Enterprise flash drives provide a new Tier 0 storage layer capable of delivering high I/O performance at a very low latency. Proper use of EFDs in an Oracle environment can deliver increased performance compared to fibre channel drives. Read the recommendations for identification of the best DB components for EFDs.
Gartner Research Report: Magic Quadrant for Application Delivery Controllers, 2009
WHITE PAPER
The market for products to improve the delivery of application software over networks remains dynamic and innovative. Vendors focused on solving enterprises' most-pressing application problems have become the top players.
Eight Criteria for Server Load Balancing
WHITE PAPER
Server load balancers are a simple yet highly effective means to scale an application environment while ensuring its availability. Today's solutions should also address application performance and security. Read about the top eight criteria you should consider when choosing a server load balancer and how Citrix NetScaler meets those requirements.

What People Are Saying

White Papers & Webcasts

Why Email Must Operate 24/7 and How to Make This Happen
Learn how to avoid an email outage by implementing a hosted email continuity solution.  

Insight from an Auditor: Ensuring a Successful PCI Audit
Ensure a successful PCI audit. Watch this webcast now.

Preventing Data Loss When Migrating to Microsoft 2007
Download this new white paper today!  

Beyond Basic Back-Up: Disaster Recovery
It's not always a flood or fire- 50% of "disasters" are caused by users. Learn more now!

Serving Up Faster Registration
Download this Case Study now!  

Disaster Recovery 2008: Reduced Costs and Improved Performance
How long can your Enterprise afford to be without your data? With an accelerated disaster recovery program, you never have to answer this...

HP StorageWorks EVA4400 & Microsoft
Download this video, free, compliments of HP.

Virtual Workforce: The Key to Expanding The Business While Cutting Costs
How to cut costs while growing your business. Learn more now!  


IT Jobs