Sure, in this special report you've read a lot of good advice on how to prepare for disasters. But now we're taking a different approach. Our Sharky has come across his share of disaster recovery stories over the years -- real stories sent in by real readers. Here we've gathered some of the best. Think of these as how NOT to do disaster recovery.
Recovery disaster
The big blackout hits, but power from UPSs lets this pilot fish shut things down properly in the server room. Next day, electric service is restored -- but now fish is locked out of the server room. "The computer room door has a key-card lock, and the PC controlling it did not reboot properly and won't open the door," fish groans. "Several hours later, a key is located. Our disaster plans now include the key to the server room."
Disaster recovery
In case of fire Newly hired IT operations pilot fish gets briefed on emergency procedures: "If a fire should occur in the data center, it's your responsibility to exit the data center carrying the tapes used for disaster recovery from the fire rack," manager says. "If you should become trapped in the data center with these tapes in your possession, you will be fired immediately. No discussion."
It doesn't work that way
When a highly unusual ice storm hits this phone-support center in a Southern city, the phones go out at 10:15 a.m., a pilot fish reports; power goes at 10:45. But two hours later, boss still won't let the staff go home. His logic: "Our California customers aren't dealing with an ice storm, so they'll be at work trying to call us."
Economics lesson
Lightning strikes a power transformer about 100 feet from the systems room for a large school district, reports this tech support pilot fish working there.
"There is no surge protection of any kind on the line, so the bolt comes through and fries a couple of servers, disk drives, multiplexers and ancillary equipment, says fish.
The whole system is very, very dead for more than a week until new hardware is installed. "No one in the school district could do accounting, payroll or student record data entry," fish says. "The actual cost of fixing it was enormous, but insurance picked up most of it."
With the new hardware up and running, district's IT boss puts in a proposal for surge protection and UPSs to keep it that way.
And the school board, in its infinite wisdom says ... no.
"It'll save money if we just keep up the insurance policy and skip the cost of protection," board tells IT boss.
"What about the cost of the downtime?" IT boss asks.
Not to worry, says board. "Because we're a school district, the taxpayers cover the cost of the labor, so that's not an expense we need to worry about."
If it ain't broke ... break it?
This engineering design firm brings in an IT pilot fish to draft a disaster recovery plan, so fish gets a walk-through tour of the computer-aided drafting (CAD) area with the production manager.
"In analyzing the production process, I see drafter/illustrators creating beautiful CAD drawings of equipment," fish says.
"The production manager explains that these drawings will become part of a master document used to stow equipment in special cargo carriers."
In the next room, there's another group of artists hard at work. "They're hunched over large digitizing tables, meticulously tracing CAD drawings, pixel by pixel," says fish.
Why are they doing that? fish asks production manager. After all, those drawings were printed from CAD files that have already been prepared.
"The production manager explains that the computer they must use to prepare the document is about 30 years older than the ones the drafter/illustrators use for the drawings," fish says.
"And the only way to get the data into the old computer is to digitize it by hand."
So when fish delivers his disaster recovery plan, it includes two recommendations regarding the 30-year-old system. First: In case of a disaster, don't replace the old computer.
And second: "Pray for a disaster," fish says. "Soon."
The power of information
Ice storms blow through Midwest town, taking out power for up to three days. Manager instructs Web developer pilot fish that next time storms hit, he should "add a section to our Web site, telling people how much longer their power will be off." "I've created the system," the fish reports, "but anyone who needs the information won't have any power to turn their computer on!"
First thing, we'd probably install water sensors
This big Internet company has a small catastrophe: A water-cooled air handler springs a leak and starts flooding the computer room floor, reports a pilot fish on the scene.
Fortunately, just as they're designed to do, water sensors under the raised floor begin shrieking, alerting staff to the problem.
It takes several hours for the staff to call vendors, find the leaky parts and vacuum up water from the affected areas, but they've got it under control.
Meanwhile, the noise from the alarms is getting on the nerves of a newly appointed executive. The VP orders one of the support staffers to raise the sensors on all the risers until they shut up.
Which they do -- and the cleanup is finished up, and everything is quiet.
Until three months later -- when it happens again. And the IT staff discovers that no one ever returned the water sensors to their original levels.
Before anyone notices the flood this time, the water rises high enough to short out equipment and make the recovery a lot more complicated?
And who should take the heat for it? There's no question in the mind of this VP: his staff.
"It doesn't matter that the sensors were set higher," veep tells fish. "You should have been more vigilant. Suppose you didn't have sensors? What would you do then?"
Unclear on the concept
This company moves its call center to what used to be a bank, and the fireproof walk-in safe seems like the ideal place to store backup tapes. But when a fire guts the place, IT pilot fish is stunned to discover the tapes melted -- and a charred block of wood wedged in the safe's doorway. "We ran out of space for stationery," explains the call center manager, "so we stored it in the safe. But some of the girls had a hard time opening the heavy safe door, so we thought we'd just wedge it open."
What a blast!
It's the late 1970s, and this chemical company builds a state-of-the-art computer room, complete with a Halon fire-suppression system, says a pilot fish working there.
But state-of-the-art doesn't come cheap, and this is a big room. "So some genius decided that instead of running Halon pipes all over the room, they would put them only along one wall, with the pipes positioned to blow the Halon across the room," fish says.
"Naturally, the Halon was high-pressure. In fact, the first test bent the Halon pipes, but they just braced them to prevent that."
One Saturday, fish is working at a keypunch machine in the computer room when the Halon alarm sounds.
"It was supposed to give a 30-second warning before releasing the Halon," says fish. "But when I looked up, it was already blasting across the room, carrying everything loose in the room with it!"
Fish ducks under the keypunch, and the computer operator dives under the console table.
"When we emerged, it was a disaster area," fish reports. "Listings and paper were swirled and scattered, pens were embedded in walls, a paper tablet was sliced in half by an acoustic ceiling support, and several disk pack covers had been flung the length of the room and smashed."
And what set it off? Turns out technicians from the company maintaining the Halon system had just shown up to do routine maintenance.
"They swore they never got near the control panel," fish says. "Yeah, sure. It took a lawsuit to get them to pay the $15,000 to recharge the Halon tanks.
"But my company had to pay to repipe the computer room with low-pressure Halon."
Those fire marshals -- no sense of humor at all
This IT pilot fish is on the systems management team charged with running this manufacturer's two engineering data centers.
"One of our duties was to ensure that the corporate engineering resources are powered down in case a fire ever hits the plants housing the data centers," says fish.
So they know just what to do when the fire marshal and the head of plant security decide to hold an unannounced fire drill.
"True to our duties, the big, red button on the wall was depressed to shut down all power to the data center," fish says. "Everyone evacuated the building and waited for the 'all clear' to be issued.
"That's when the problems began."
That's because the emergency shutdown switch cuts power instantly. And because the machines haven't been shut down properly, it takes more than an hour to get everything back online.
"This meant the engineering staff were idle for that hour, since they had long ago forgotten how to read trade journals or hold person-to-person technical discussions," says fish.
"The VP of engineering was livid, since his troops had to sit around idle. Something had to be done. And he knew just what to do."
The veep can't give the systems management team advance warning of the fire drills -- the fire marshal won't allow that. So he comes up with a new plan.
"Under his plan, the systems management team would have to exit the building without bringing the systems down," says fish. "Then we would determine if the alarm was real. And if the alarm was for a real fire, we would return to the data center to power everything down -- and then re-evacuate the building.
"Well, the systems management team objected to this plan, since none were trained firefighters. We tried to point out that some alternative must be possible -- one that would not require us to re-enter a burning building."
But the arguments don't seem to have any effect. Management has spoken, and it likes the sound of its own voice -- especially since the back-into-the-burning-building plan requires no extra expense.
But mysteriously, a short time later, fish and the systems management team are called into an emergency meeting.
"A new solution had been identified," says fish. "Externally accessible big, red buttons would be installed at secured locations, so they could be triggered safely away from a fire.
"No one ever did disclose how the state fire marshal got involved..."
On-the-job training -- the hard way
It's time for the annual inspection of the fire-suppression system at this credit union's data center, says a pilot fish who works there.
"All was going well except for the bad attitude of the guy from the fire-suppression system company," fish says. "Then we suddenly lost all connection to the Internet.
"This is really bad because we have Web applications that are constantly in use, and immediately we started receiving phone calls faster than we could answer them."
Fish and another engineer sprint for the data center only to find the fire-suppression guy, who says, "Please tell me that I didn't hit anything vital."
But of course, he did. In fact, he's just dropped a floor tile on the cable that runs to the credit union's primary router for the Internet.
Fish and his co-worker quickly swap in an undamaged cable and have the Internet connection back up in less than two minutes. "We congratulated ourselves on our foresight in putting in redundant cables," says fish. "The fire-suppression guy apologized, and all was well."
For about 90 minutes.
Then suddenly the Internet is down again, phones are ringing like crazy and fish and the other engineer are running for the data center again.
"This time when we open the door to the data center, the normal white noise of power supply fans is completely absent," says fish. "None of the racks in the data center have power, and the two giant UPS units are powered down.
"Seems that this genius had tested the one thing that is never supposed to be tested except during a fire. He had activated the power shunt that shuts off all power during a fire -- and it is specifically designed not to be reset easily."
It takes two electricians working frantically to bypass the shunt in order to get the data center back up. Total data center downtime: Four hours, not including the time it will take later to route power back through the shunt once it has been rebuilt.
And the guy from the fire-suppression company? "He was quite belligerent and said he had done nothing wrong," fish reports. "But his boss showed up rather quickly and insisted that this would never happen again and that from now on he would be accompanying any of his people who came to our site.
"And after the inspector had been sent home, his boss told us that the guy had only been working with the company for about a week."
Do NOT try this with an online publication!
It's 1985, and this magazine publisher is converting a storage room to serve as a real computer room, reports a pilot fish who was there at the time.
"They hired a consultant and put in a raised floor, glass window and operator's desk outside the room," says fish. "Extra air conditioning was needed, so the unit was placed in a neighboring room -- and of course a new Halon system was installed, complete with all the alarm bells and whistles."