IDG News Service - A worldwide outage of Google's Gmail online e-mail system on Tuesday was caused by a traffic jam on its servers, according to Google's official Gmail blog.
The problem was that some recent changes designed to improve traffic flow on request routers, servers designed to direct Web queries to the appropriate Gmail server, overloaded the system after workers took some Gmail servers offline to perform routine upgrades.
"As we now know, we had slightly underestimated the load which some recent changes placed on the request routers," Ben Treynor, site reliability Czar wrote on the Gmail blog. "At about 12:30 p.m. Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them also to become overloaded, and within minutes nearly all of the request routers were overloaded."
The overload resulted in people around the world being unable to access Gmail for about 100 minutes, Treynor said, though he noted that IMAP/POP access and mail processing continued to work normally.
Gmail engineers were alerted to the problem within seconds of the failures and after figuring out what the problem was, brought additional request routers online. Now, Gmail is more than 99.9 percent available to users, he said.
"We've turned our full attention to helping ensure this kind of event doesn't happen again," he wrote.
One fix the company plans to make is to ensure request routers will work better by having them slow down when overloaded instead of refusing to accept traffic. Treynor said the request routers need to have sufficient failure isolation so that a problem in one data center doesn't affect servers in another data center.
The company will work over the next few weeks to make these changes and further improve reliability, he said.
- How WAN Optimization Helps Enterprises Reduce Costs If you wanted to break down innovation into a tidy equation, it might go something like this: Technology + Connectivity = Productivity. Productivity...
- Four Little-Known Ways WAN Optimization Can Benefit Your Organization WAN optimization has evolved into a complete system that optimizes traffic across a broad range of most popular applications while providing deep visibility...
- SharePlan Security SharePlan is a continuous, secure, enterprise-ready file sync and share platform that facilitates smart, real-time collaboration across all devices.
- Three Ways Your DNS Can Impact DDoS Attacks Domain Name System (DNS) plays a big role in consumers' day-to-day Internet usage and is a critical factor when it comes to distributed...
- Online Video and Web Traffic: Sochi 2014 Winter Olympic Games Over 25 leading global broadcasters worked with Akamai to deliver the action, excitement and inspiration of Sochi because they understand online viewers expect...
- Video surveillance for IT: maximum image quality, minimum bandwidth Join us on Thursday, May 8th at 1 p.m. EST when Willem Ryan, Senior Product Marketing Manager at Avigilon, will discuss how IT... All Networking White Papers | Webcasts