CSO - Another Amazon cloud-services outage occurred on Sunday, August 7th in a Dublin, Ireland data center. This one occurred due to a lightning strike that hit a transformer near the Dublin data center. It led to an explosion and fire that knocked out all utility services thereby leading to a total data center outage. Amazon had its only European data center located there.
My initial thoughts are related to disaster recovery and Amazon services. In their last significant outage in April, they had a network configuration change that led to an outage of services in the eastern United States. This outage begs other questions. Why isn't Amazon deploying a redundant power source, like a diesel powered backup? Maybe they did, but the fire blew out a portion of that utility service. So a more serious disaster emerged from an initial transformer explosion.
[Related: Creating a cloud SLA from diagnostic data]
How could this be addressed? How about fail-over to services in another geographic location in Europe. This didn't happen. I can only guess that building out another data center is cost prohibitive at this time and that is why Amazon doesn't have another European data center. The rest of the article mentions that it will take Amazon up to two more days to bring up the remaining servers.
It mentions that a significant period of time is being taken to start all of the servers up again. It also states that Microsoft, who has services in the same data center, does not have the same weakness. I wonder why this is; data replication should be a high priority, especially when Amazon lacks full-scale data center disaster recovery.
On Monday, August 8th, Amazon mentioned that a software error is slowing the recovery of the data within the European data center. This points to another error, a lack of business continuity testing. This testing is necessary, because conditions like this occur rarely. It also points to the fact that complex configurations make it hard to test various scenarios. Only deploying and testing a minimum number of application configurations is realistic. Otherwise there are too many permutations to test. See a previous disaster recovery article that mentions products should have standard configurations, similar to a car engine configuration and the car model.
So, it looks as if Amazon has more cloud services weaknesses that are bubbling up due to operational stresses. How can mid-sized and small businesses that outsource their web applications to Amazon's cloud protect themselves? It's clear that Amazon supports cloud applications where profitable. I suggest that those firms create a very detailed, per application SLA (Service Level Agreement) that lists global up-time, performance, and penalties when service isn't meeting objectives.
- Accelerating Cloud Deployment and Operations with Managed Services Companies that do not have sufficient in-house expertise to either deploy or maintain an IaaS cloud should turn to Managed Service Providers .
- Rethinking IT Operations in the Cloud This paper breaks down the challenges that often prevent the cloud from delivering the fast, flexible and affordable infrastructure companies seek - and...
- Gartner Magic Quadrant for Cloud-Enabled Managed Hosting, North America Cloud-enabled managed hosting brings cloudlike consumption and provisioning attributes to the traditional managed hosting market
- Clearing the Network Hurdle to Cloud Deployment Although enthusiasm is high among IT pros for cloud services, an IDG Research Quick Poll survey found that, in fact, the cloud is...
- Why Are Customers Really Deploying an NGFW? It seems every IT Security expert is talking about the NGFW, but what are people really doing? This webcast covers 5 real-world customer...
- ElectricAccelerator: Dramatically Faster Builds and Test ElectricAccelerator dramatically speeds up builds and test by parallelizing jobs across clusters of physical or cloud CPUs. All Cloud Computing White Papers | Webcasts
Our new weekly Consumerization of IT newsletter covers a wide range of trends including BYOD, smartphones, tablets, MDM, cloud, social and what it all means for IT. Subscribe now and stay up to date!