Skip the navigation
News

Amazon's data center outage reads like a thriller

The outage shows why performance monitoring services are gaining ground

By Patrick Thibodeau
December 11, 2009 01:07 PM ET

Computerworld - When an Amazon Web Services data center lost power early Wednesday, the company wrote about the unfolding event with the brevity and tension of one its bestselling pot boilers.

Our anonymous author, who we'll call Sysadmin, begins his story simply, without emotional complications and love interests.

"We are investigating connectivity issues for instances in the US-EAST-1 region," Sysadmin writes on Amazon's operations status board at 1:08 a.m. PT.

With one sentence, we're intrigued. Something's up with Amazon's data center in Northern Virginia, just a short drive to Washington; Tom Clancy country.

You can almost feel what's going on. Cloud-based services are crashing and there's a scramble for answers. Elsewhere, PC screens are refreshed as readers wait for an update from Sysadmin, (Kindle edition not yet available). Some 18 minutes pass. Tension builds.

Sysadmin offers an update, referring to isolated "power issues."

Inside the data center a real, red-light-flashing drama unfolds.

At first, a "single component of the redundant power distribution system failed in this zone," Sysadmin would later write in a postscript for his audience. But while the data center staff worked on that component, there was a twist: "A second component, used to assure redundant power paths, failed as well."

Customers are losing connectivity.

Whether data center staff cheered when the problem was fixed remains a mystery. But as soon as the "defective power distribution units were bypassed, servers restarted and instances began to come online shortly thereafter," wrote Sysadmin.

Readers wouldn't get those details until later, when Sysadmin had more information and time. In those early minutes of the outage, only essential information gets to anxious readers. At 1:51 a.m., Sysadmin wrote: "The underlying power issue has been addressed. Instances have begun to recover."

At 2:11 a.m., he writes again: a recovery is well under way.

All that's left are the reviews. That's where companies like Wellesley Mills, Mass.-based Apparent Networks Inc. come in.

In November, Apparent Networks launched its Cloud Performance Center, an online service that allows anyone to review -- in real-time -- the performance of 16 cloud providers, including Amazon and Google. It covers such things as bandwidth capacity, latency and data loss, then scores them overall.

Jim Melvin, president of the privately held Apparent Networks, said his firm can continuously monitor network performance over WANs using technology it has extended to the cloud. The monitoring is done with a "very lightweight stream of packets" that continuously travels the network to monitor activity and cloud performance.

With the available free version of its PathView Cloud tool, users can detect performance issues with the network or cloud provider, and see whether service level performance agreements are being met, Melvin said.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Cloud Computing White Papers
Five Myths of Cloud Computing
This white paper separates fact from fiction, reality from myth, and, in doing so, will aid senior IT executives as they make decisions...
IBM Synchronizes its Commerce 2.0 Strategy with 'Smarter Commerce' Initiative
On March 14, IBM announced "Smarter Commerce", a strategic initiative that addresses the surging market for Commerce 2.0 solutions that take advantage of...
TechRepublic: Cloud Computing - Potential Value for Your Company?
Content provided by Google

Imagine a world without the hassle of licenses and hardware management - cloud computing makes this possible. Learn more about...
Forbes: Enterprises Set Their Strategies for Cloud Computing
Content provided by Google

This Forbes Insights paper shares how enterprise companies are still crafting their strategies and testing their options to determine if...
HBR: What Every CEO Needs to Know About the Cloud
Content provided by Google

This Harvard Business Review article explains the Cloud and its benefits, highlights the implications of various concerns, and makes recommendations...
All Cloud Computing White Papers
Cloud Computing Webcasts
Live Webcast
Integrated IT Operations Management in the Cloud
Join award-winning technology editor Stan Gibson and Andrew White, CMO at Numara Software, to learn how asset management and service management are converging...
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
De-risk Deploying Business Critical Apps in Your Private Cloud
Architect your private clouds to ensure that application requirements for performance & availability are achieved with minimal risk to the business.
Navigating the Public Cloud
InfoWorld contributing editor and consultant David Linthicum offers expert advice about choosing services to outsource to the public cloud providers, cloud data security...
Integrated IT Operations Management in the Cloud
Join award-winning technology editor Stan Gibson and Andrew White, CMO at Numara Software, to learn how asset management and service management are converging...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
All Cloud Computing Webcasts
Featured Cloud Computing Blog
Chris Poelker
To cloud or not to cloud
By Chris Poelker

Instead of bulk capital expenditures for large servers and storage arrays, you can purchase computer time based on actual usage of CPU cycles and storage by the number of gigabytes or terabytes used. But here are ten things to consider before you jump into the cloud. Insider (registration required) more

Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs