Gmail, Google's email service, was down for an hour or two yesterday, causing widespread wailing and gnashing of teeth. Not just the free version, but also the Google Apps email service was affected. In IT Blogwatch, bloggers discover why it happened, as they check their SLAs.
By Richi Jennings. September 2, 2009.
Your humble blogwatcher has selected these bloggy morsels for your enjoyment. Not to mention more from the 8 Bit Pwny Club...
Sha-ron Gau-din re-ports: [please stop this running joke -Ed.]
After a nearly two-hour outage, Google Inc. is getting its Gmail e-mail service back up and running. ... Complaints began appearing on Twitter around 4 p.m. EDT. In a 4:02 p.m. EDT post on its Apps Status page, Google confirmed that Gmail was suffering an outage.
...This isn't Google's first big Gmail glitch this year. Gmail suffered well-publicized crashes in both February and May.
Cade Metz tells us what happened:
Google has pinned the breakdown on some recent changes to the request routers that direct queries to the service's web servers. Ironically, at least some of the changes were meant to improve Gmail's ability to stay online. But Google underestimated the load these changes would place on the routers when it took a relatively small number of servers offline for upgrades.
...This meant that who knows how many people were unable to access Gmail via the web - though the service was still available via POP and IMAP. Boasting that the Gmail engineering team was alerted to the problem within seconds ... the company solved the issue by bringing more request routers online. Service was restored at about 2:10pm Pacific.
Google's Ben Treynor offers a detailed mea culpa:
I'd like to apologize to all of you today's outage was a Big Deal, and we're treating it as such. We've already thoroughly investigated what happened, and we're currently compiling a list of things we intend to fix or improve as a result of the investigation.
...We've turned our full attention to helping ensure this kind of event doesn't happen again. ... We have concluded that request routers don't have sufficient failure isolation (i.e. if there's a problem in one datacenter, it shouldn't affect servers in another datacenter) and do not degrade gracefully (e.g. if many request routers are overloaded simultaneously, they all should just get slower instead of refusing to accept traffic and shifting their load). ... Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity.
But Jason Kaneshiro says 99.9%+ isn't enough:
[The] Gmail outage points out a very obvious ... business idea. Its simply shameful that basic **** still isnt working right. Hasnt it been, like, forty years since email was invented? Web-based email with 100% uptime. Promise it. Charge for it (it would be worth paying for). And deliver.
...Perhaps [it would be] extremely technically challenging, but would be wildly successful ... In the meantime, I shall continue to use an old school email desktop client (Apples Mail) to download all my Gmail info on a daily basis. The cloud still isnt reliable enough, and seeing how it took forty years to get here I figure theres still at least a decade to go.
Jennifer van Grove mashes Treynor's apology:
Thats one big oops Google. But its nice to see that youre publicly apologizing for the outage and attesting to the fact that you will do everything in your power to prevent it from happening again. Heres hoping you stick to that.
Danny Sullivan preaches self-sufficiency:
Personally, I use Outlook 2007 to download my email from a Google Apps account. This allows me to have full, dependable offline access. It allows me to periodically backup and archive my mail, protection against the rare case where Google might somehow delete my mail on their servers. It allows me to have easy access to search mail back for half a decade.
...Gmail provides full instructions on how to configure a number of email clients here. ... I ideally would like Google to offer its own lightweight email client.
So what's your take?
Get involved: leave a comment.
Don't miss out on IT Blogwatch:
- Subscribe to the Computerworld Blogs and IT Blogwatch newsletters
- Catch up with posts from the previous few days
Richi Jennings is an independent analyst/consultant, specializing in blogging, email, and spam. A 24 year, cross-functional IT veteran, he is also an analyst at Ferris Research. You can follow him as @richi on Twitter or richij on FriendFeed, pretend to be Richi's friend on Facebook, or just use good old email: firstname.lastname@example.org.