Gmail outage caused by overloaded servers
IDG News Service - A worldwide outage of Google's Gmail online e-mail system on Tuesday was caused by a traffic jam on its servers, according to Google's official Gmail blog.
The problem was that some recent changes designed to improve traffic flow on request routers, servers designed to direct Web queries to the appropriate Gmail server, overloaded the system after workers took some Gmail servers offline to perform routine upgrades.
"As we now know, we had slightly underestimated the load which some recent changes placed on the request routers," Ben Treynor, site reliability Czar wrote on the Gmail blog. "At about 12:30 p.m. Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them also to become overloaded, and within minutes nearly all of the request routers were overloaded."
The overload resulted in people around the world being unable to access Gmail for about 100 minutes, Treynor said, though he noted that IMAP/POP access and mail processing continued to work normally.
Gmail engineers were alerted to the problem within seconds of the failures and after figuring out what the problem was, brought additional request routers online. Now, Gmail is more than 99.9 percent available to users, he said.
"We've turned our full attention to helping ensure this kind of event doesn't happen again," he wrote.
One fix the company plans to make is to ensure request routers will work better by having them slow down when overloaded instead of refusing to accept traffic. Treynor said the request routers need to have sufficient failure isolation so that a problem in one data center doesn't affect servers in another data center.
The company will work over the next few weeks to make these changes and further improve reliability, he said.
Reprinted with permission from
Story copyright 2009 International Data Group. All rights reserved.
Additional Resources



White Papers & Webcasts
Southern Company
Download Now
Aligning IT to Business: The Rising Importance of Application Delivery Networks
Application Delivery Networking (ADN) will play a vital role in helping enterprises incorporate strategic technologies to achieve business initiatives.
Defending Against the Storm
Download Now
Mitigate Risk, Lower Costs and Improve Network Efficiency
Create a stable IP network that not only meets today's challenges, but is flexible enough to also meet future demands.
Share our Strength
Download Now
Preparing Your Business Services for the Future
Would you trust your network monitoring tools enough to know when something is truly halting a business service?
IPAM: Slashing Network Costs
Slashing Network Costs by Consolidating and Automating Core Network Services
Essential Archive Requirements for E-Discovery
Register Now!
Horror stories: Managing IT Across Multiple Locations
How one extra sharp IT manager eliminates daily agony, hassle and repetition.
Computerworld Reports
Disaster Recovery & Cost Savings Zone
Thousands of customers world-wide have turned to virtualization solutions from Riverbed as a way to reduce costs.
