Strangely enough, last month's column about the importance of service-level agreements (SLA) was posted on April 21, the day of the big Amazon EC2 outage. While that outage certainly helped to illustrate the importance of having good SLAs, I swear that the timing was merely a coincidence.
Amazon's outage serves as a good reminder that what you really want is uninterrupted service, and SLAs alone aren't enough to get you there. Operational outages like Amazon's can definitely reduce service availability, so it's important to do what you can to understand and mitigate the likelihood of unplanned downtime in advance.
Amazon's own summary of how the outage happened makes it clear that it was not the result of external forces, which is something that folks often worry about with the cloud. Instead, it was a routine upgrade that was "executed incorrectly." The resulting problems were made worse by a ripple effect of interrelated automated systems and processes that weren't configured to anticipate or handle the initial error.
This type of outage highlights the question of whether or not the application of technology in the cloud has outpaced the ability to effectively manage it. So let's take a closer look at infrastructure operations management issues and some ways to contractually address them when adopting a cloud service.
The virtual nature of cloud computing makes it easy to forget that the service you get is dependent upon a physical . And the infrastructure behind the scenes of a public cloud computing service is a lot more complicated than a traditional data center.
In addition to general computing components such as virtual machine monitors, data storage and associated middleware, a public cloud infrastructure has to deal with things like workload management, data replication and recovery, and resource metering. And to make matters worse, all of these have to interact effectively, while they change over time as feature improvements and bug fixes are continuously rolled out.
To top it all off, all cloud providers are not created equal; there are both new and established players in this market, so they don't all necessarily have the same knowledge and experience. To ensure that you're hooking up with a cloud provider that has well-run, efficiently structured data centers, it's important that you check the provider's specific infrastructure processes and practices before diving in.
The first question that likely comes to mind is, "How do I do this?" The simple answer is, "Ask the cloud provider questions." There are already some good templates that you can leverage to build your own infrastructure questionnaire. The Cloud Security Alliance's (CSA) Consensus Assessments Initiative Questionnaire serves as one good example.
While you need to customize your questionnaire to the specific needs of each situation, you'll typically want to learn about how your cloud provider does infrastructure architecture in areas such as:
* Capacity and resource planning
* Data replication, storage, distribution and recovery
* Change management policies and procedures
* Virtual server provisioning and management
* Asset inventory and management policies and processes
*Software development quality assurance
In each area, you'll want to find out what industry standards the cloud provider aligns with and/or benchmarks against, and how often it does that.
Once you've identified the appropriate practices, you can determine which are the most important to meet your specific needs, and then codify them in the contract. One good way to do this is to leverage the provider's responses to your questionnaire by incorporating them into the contract as minimum standards.
There's still more to tackle regarding the cloud provider's infrastructure, but I'm thinking that this is more than enough to digest for this go-round, so we'll save the rest for next month. Until then, remember:
We're all in this together, so be sure to let me know if I can help.
Thomas Trappler has extensive experience leading enterprisewide IT sourcing and vendor management initiatives and negotiations focused on cost savings and risk mitigation, with an emphasis on cloud computing contracts and software licensing agreements. Contact him at ThomasTrappler@gmail.com.