Ads by TechWords

See your link here
Receive the latest technology news and information.
Mobile/Wireless Computing
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
Cloud Computing
View all newsletters




Privacy Policy
 

Update: RIM explains its BlackBerry outage

Cascading software and system problems caused interruption

April 20, 2007 12:00 PM ET

Computerworld - Research In Motion Ltd. reported late last night that software that was designed to optimize caching capability on its network triggered the widespread BlackBerry wireless e-mail service interruption on Tuesday night.

The outage lasted about 12 hours overnight Tuesday for BlackBerry users mainly in North America, RIM and users reported.

RIM said a fail-over system designed to stop the impact of such a problem did not work as expected, either. The company apologized to its 8 million users. RIM added that security and capacity issues were not the cause of the outage.

"RIM has determined that the incident was triggered by the introduction of a new, noncritical system routine that was designed to provide better optimization of the system's cache," RIM officials said in a statement.

"The system routine was expected to be nonimpacting with respect to the real-time operation of the BlackBerry infrastructure, but the pretesting of the system routine proved to be insufficient," the statement said.

The new system routine "produced an unexpected impact and triggered a compounding series of interaction errors between the system's operational database and cache," according to the statement. "After isolating the resulting database problem and unsuccessfully attempting to correct it, RIM began its fail-over process to a backup system."

RIM described the backup system inadequacies this way: "Although the backup system and fail-over process had been repeatedly and successfully tested previously, the fail-over process did not fully perform to RIM's expectations in this situation and therefore caused further delay in restoring service and processing the resulting message queue."

RIM also apologized and said it would bolster its testing, monitoring and recovery processes as a result of the problem. "RIM apologizes to customers for inconvenience resulting from the service interruption. RIM's root cause analysis and system enhancement process with respect to this incident is ongoing, and RIM has already identified certain aspects of its testing, monitoring and recovery processes that will be enhanced as a result of the incident and in order to prevent recurrence," the statement said.

Despite RIM's explanation, users and analysts today said they still wanted more communication from the company in the aftermath of the outage.

"I am satisfied with their explanation and apology," said John Halamka, CIO of Harvard Medical School and CareGroup HealthCare System in Boston. "In the future, I hope they are more proactive about acknowledging the problem and communicating with their customers. Even now [11 a.m. EDT], the RIM and BlackBerry Web sites have no information about the outage."

David Maynor, chief technology officer at Errata Security LLC in Atlanta, said he was "outraged" about the situation. "If the power company were to fail, a more detailed analysis would be given," he said. "With the increasing number of mission-critical services dependent on the BlackBerry, I can't believe a software upgrade would cause such a massive failure."



Jump to comments

BlackBerry

Additional Resources

EFD vs. HDD - What You Need to Know
WHITE PAPER
Enterprise flash drives provide a new Tier 0 storage layer capable of delivering high I/O performance at a very low latency. Proper use of EFDs in an Oracle environment can deliver increased performance compared to fibre channel drives. Read the recommendations for identification of the best DB components for EFDs.
Gartner Research Report: Magic Quadrant for Application Delivery Controllers, 2009
WHITE PAPER
The market for products to improve the delivery of application software over networks remains dynamic and innovative. Vendors focused on solving enterprises' most-pressing application problems have become the top players.
Eight Criteria for Server Load Balancing
WHITE PAPER
Server load balancers are a simple yet highly effective means to scale an application environment while ensuring its availability. Today's solutions should also address application performance and security. Read about the top eight criteria you should consider when choosing a server load balancer and how Citrix NetScaler meets those requirements.

What People Are Saying

White Papers & Webcasts

Accelerating Your Mobile Workers: Controlling the Uncontrollable
Today's workforce is truly mobile. Unlike the managed environment of the office LAN, remote users face many challenges to being productive while out...

eGuide: Enterprise Security
Smart Security Strategies for 2010. Read now!  

Managing Laptops Outside the Office
Learn how you can reduce costs by tracking mobile computers no matter where they are located.

Mobile U Webinar
Watch Now!

The New Mobile Order
Download Now  

4G Ahead Video Program
Uncover the features and benefits of the two leading 4G technologies for enterprises considering future deployment.

WAN Application Delivery for Executives
Learn how to simplify server and application administration without creating performance problems for distributed users.  

Horror stories: Managing IT Across Multiple Locations
How one extra sharp IT manager eliminates daily agony, hassle and repetition.


IT Jobs