IDG News Service - A worldwide outage of Google's Gmail online e-mail system on Tuesday was caused by a traffic jam on its servers, according to Google's official Gmail blog.
The problem was that some recent changes designed to improve traffic flow on request routers, servers designed to direct Web queries to the appropriate Gmail server, overloaded the system after workers took some Gmail servers offline to perform routine upgrades.
"As we now know, we had slightly underestimated the load which some recent changes placed on the request routers," Ben Treynor, site reliability Czar wrote on the Gmail blog. "At about 12:30 p.m. Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them also to become overloaded, and within minutes nearly all of the request routers were overloaded."
The overload resulted in people around the world being unable to access Gmail for about 100 minutes, Treynor said, though he noted that IMAP/POP access and mail processing continued to work normally.
Gmail engineers were alerted to the problem within seconds of the failures and after figuring out what the problem was, brought additional request routers online. Now, Gmail is more than 99.9 percent available to users, he said.
"We've turned our full attention to helping ensure this kind of event doesn't happen again," he wrote.
One fix the company plans to make is to ensure request routers will work better by having them slow down when overloaded instead of refusing to accept traffic. Treynor said the request routers need to have sufficient failure isolation so that a problem in one data center doesn't affect servers in another data center.
The company will work over the next few weeks to make these changes and further improve reliability, he said.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- What Datapipe customers need to know about the new PCI DSS 3.0 compliance standard This handy quick reference outlines what PCI DSS 3.0 is, who needs to be compliant and how Alert Logic solutions address the new...
- Defense Throughout the Vulnerability Life Cycle This whitepaper provides insight into how to leverage threat and log management technologies to protect your IT assets throughout their vulnerability life cycle.
- The Critical Role of Support in Your Enterprise Mobility Management Strategy Most business leaders underestimate the importance of tech support when they choose an EMM solution. Here's what to put on your checklist.
- Separating Work and Personal at the Platform Level: How BlackBerry Balance Works BlackBerry® Balance™ separates work from personal on the same mobile device, right at a platform level. Find out how it can work for...
- Live Webcast Best Practices for the Hyperconverged Enterprise Network To the Age of Constant Connectivity and Information overload
- Getting Ready for BlackBerry Enterprise Service 10.2 Find out how BlackBerry® Enterprise Service 10 helps organizations address the full spectrum of EMM challenges, while balancing the needs of both the...
- Containerization Options: How to Choose the Best DLP Solution for Your Organization This webcast outlines a framework for making the right choice when it comes to containerization approaches, along with the pros and cons of... All Networking White Papers | Webcasts