When the site GDGT.com went live this past summer, Ryan Block was expecting a lot of interest.
Prior to launch, the former Engadget.com editor in chief had built up momentum for the site -- which allows everyday users to write gadget reviews -- by informing bloggers and online publications. "We were excited but wary, because there's always an x factor," says Block. "We did weeks of performance and load testing, but lab testing will always differ from real-world usage, and we knew there would still be issues here and there that we wouldn't find until thousands of people were actually using the site."
Indeed, on Aug. 4, GDGT went live -- and a few hours later Block was forced to post a message explaining that the site was not available because of unanticipated levels of interest, which included thousands of users signing up for accounts and visiting the home page. Block says the problem was related to database performance.
Joe Skorupa, a Gartner Inc. analyst, says GDGT experienced what he calls "catastrophic success" -- an unusual surge in traffic that can bring a Web site to its knees. Its seems like there's another story about a site experiencing colossal failure every week: a Twitter outage, Facebook downtime or Gmail problems. (Twitter Inc., Facebook Inc. and Google Inc. representatives all declined to comment on outages.)
Skorupa says there is a common misunderstanding about the public Internet -- which is notoriously flaky and consists of many interconnected networks. It's not the same as corporate cloud computing, private networks that occasionally use the public Internet, or financial services on the Web, which are mandated to be available 24/7. In his view, the public Internet should not be viewed as being as reliable as, say, a private connection between bank offices.
There is also a misunderstanding about a site "going down." Typically, a server has not crashed entirely, it's more likely a data center problem, says James Staten, a Forrester Research Inc. analyst.
"A service doesn't go down, but gets so slow that it's viewed as nonresponsive," says Staten. "Load balancers take all incoming requests and route them to the Web servers based on their responsiveness. This architecture can become unresponsive when it's overwhelmed by the number of requests for content."
In the end, the GDGT traffic problems calmed down after the initial launch, thanks to improved database speed and caching techniques that were employed to address future problems with traffic.
In other cases, a denial-of-service (DoS) attack, such as the one that caused Twitter and other sites to go dark for several hours in August, can create the same kind of overload and congestion. Staten says other causes of Web site failure include poorly configured system components and out-of-date patches and updates on Web servers.
For most Web users, the occasional outage is one thing, but frequent downtime can cause serious business delays. As we rely more and more on Web applications, even those related to social networking, Internet uptime is becoming more critical.
The following strategies for dealing with public Internet outages -- which admittedly include some that are more controversial than others -- will help you pave a smoother superhighway to your company's Web site.