CSO - Another Amazon cloud-services outage occurred on Sunday, August 7th in a Dublin, Ireland data center. This one occurred due to a lightning strike that hit a transformer near the Dublin data center. It led to an explosion and fire that knocked out all utility services thereby leading to a total data center outage. Amazon had its only European data center located there.
My initial thoughts are related to disaster recovery and Amazon services. In their last significant outage in April, they had a network configuration change that led to an outage of services in the eastern United States. This outage begs other questions. Why isn't Amazon deploying a redundant power source, like a diesel powered backup? Maybe they did, but the fire blew out a portion of that utility service. So a more serious disaster emerged from an initial transformer explosion.
[Related: Creating a cloud SLA from diagnostic data]
How could this be addressed? How about fail-over to services in another geographic location in Europe. This didn't happen. I can only guess that building out another data center is cost prohibitive at this time and that is why Amazon doesn't have another European data center. The rest of the article mentions that it will take Amazon up to two more days to bring up the remaining servers.
It mentions that a significant period of time is being taken to start all of the servers up again. It also states that Microsoft, who has services in the same data center, does not have the same weakness. I wonder why this is; data replication should be a high priority, especially when Amazon lacks full-scale data center disaster recovery.
On Monday, August 8th, Amazon mentioned that a software error is slowing the recovery of the data within the European data center. This points to another error, a lack of business continuity testing. This testing is necessary, because conditions like this occur rarely. It also points to the fact that complex configurations make it hard to test various scenarios. Only deploying and testing a minimum number of application configurations is realistic. Otherwise there are too many permutations to test. See a previous disaster recovery article that mentions products should have standard configurations, similar to a car engine configuration and the car model.
So, it looks as if Amazon has more cloud services weaknesses that are bubbling up due to operational stresses. How can mid-sized and small businesses that outsource their web applications to Amazon's cloud protect themselves? It's clear that Amazon supports cloud applications where profitable. I suggest that those firms create a very detailed, per application SLA (Service Level Agreement) that lists global up-time, performance, and penalties when service isn't meeting objectives.
- Best iPhone, iPad Business Apps for 2014
- 14 Tech Conventions You Should Attend in 2014
- 10 Desktop Apps to Power Your Windows PC
- How to Add New Job Skills Without Going Back to School
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
Red Hat Enterprise Linux - The Original Cloud Operating System
Linux adoption is growing against a number of measures, such as the
number of supercomputers that run Linux and the size of the contributing...
- OpenStack Hype vs. Reality: CIO Quick Pulse Open-source architecture can enable IT departments to build infrastructure-as-a-service (IaaS) clouds running on standard hardware.
- Maximize Strategic Flexibility by Building an Open Hybrid Cloud Choosing how to build a cloud is the biggest strategic decision IT leaders will make this decade. It determines their organizational competitiveness, flexibility,...
- ESG: The IBM FlashSystem 840: Technical Evolution to Deliver Business Value In this whitepaper, you will learn how this high-speed storage technology has tremendous potential to support I/O-intensive and/or latency-sensitive applications.
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well...
- Cloud Knowledge Vault Learn how your organization can benefit from the scalability, flexibility, and performance that the cloud offers through the short videos and other resources... All Cloud Computing White Papers | Webcasts