Ads by TechWords

See your link here
Receive the latest technology news and information.
Networking
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

Gmail outage caused by overloaded servers

September 2, 2009 01:57 AM ET

Active Comments
Anonymous says: they need to go back and look at their disaster recovery procedures and their backup schedules........
Anonymous says: ...the 'servers' are the request routers. Statement is accurate....


IDG News Service - A worldwide outage of Google's Gmail online e-mail system on Tuesday was caused by a traffic jam on its servers, according to Google's official Gmail blog.

The problem was that some recent changes designed to improve traffic flow on request routers, servers designed to direct Web queries to the appropriate Gmail server, overloaded the system after workers took some Gmail servers offline to perform routine upgrades.

"As we now know, we had slightly underestimated the load which some recent changes placed on the request routers," Ben Treynor, site reliability Czar wrote on the Gmail blog. "At about 12:30 p.m. Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them also to become overloaded, and within minutes nearly all of the request routers were overloaded."

The overload resulted in people around the world being unable to access Gmail for about 100 minutes, Treynor said, though he noted that IMAP/POP access and mail processing continued to work normally.

Gmail engineers were alerted to the problem within seconds of the failures and after figuring out what the problem was, brought additional request routers online. Now, Gmail is more than 99.9 percent available to users, he said.

"We've turned our full attention to helping ensure this kind of event doesn't happen again," he wrote.

One fix the company plans to make is to ensure request routers will work better by having them slow down when overloaded instead of refusing to accept traffic. Treynor said the request routers need to have sufficient failure isolation so that a problem in one data center doesn't affect servers in another data center.

The company will work over the next few weeks to make these changes and further improve reliability, he said.


Reprinted with permission from

IDG.net
Story copyright 2009 International Data Group. All rights reserved.

Jump to comments

Google

Additional Resources

Microsoft
Here are some of the key reasons why you would want to run Unified Access Gateway with DirectAccess.
Microsoft
Review how one energy firm tightened protection and simplified IT work using business-ready security solutions.
Sybase
In this white paper, IDC analyzes the role of next-generation mobile enterprise platforms as organizations seek a more strategic deployment of mobile solutions.

Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.

What People Are Saying

White Papers & Webcasts

Death to PST Files
Download Now  

Business Process Framework Demo
Learn about Configurable Business Processes and Calculated Fields. Watch Now!

A Green Architectural Strategy That Puts IT in the Black
Levergage green computing across your data center. Read more now.  

Manager Experience Demo
Go beyond self-service solutions to perform more effectively. Watch Now.

Quantifying the Business Value of VMware View
Learn why you should invest in a centralized virtual desktop.  

Asia-Pacific Enterprise Network Solutions
Learn through this Webcast how your business can achieve reliability, performance and value in hard-to-reach locations within the Asia-Pacific region.

Mainsoft Webcast w/ Forrester Research: Drive SharePoint Adoption in Lotus Notes Shops
How can you drive mainstream user adoption of Microsoft SharePoint when your users rely on Lotus Notes?

Disaster Recovery & Cost Savings Zone
Thousands of customers world-wide have turned to virtualization solutions from Riverbed as a way to reduce costs.



IT Jobs