Update: RIM says system upgrade snafu led to BlackBerry e-mail outage

Initial probe points to problems with recent changes to internal data routing system

One day after a service outage temporarily left BlackBerry users in North America without access to their e-mail, Research In Motion Ltd. said an initial investigation indicated that the outage was caused by problems with an internal data routing system that recently had been upgraded.

The upgrade was part of an ongoing effort to increase network capacity, "but there appears to have been a problem with this specific upgrade that caused the intermittent service delays," RIM said in a statement sent via e-mail late today. No further explanation was provided.

Meanwhile, Gartner Inc. analyst Ken Dulaney said the fact that RIM had a second outage of several hours within 10 months shows that business customers need to set up notification systems in addition to BlackBerry service that will tell a user when a critical message doesn't get through.

RIM repeated earlier comments that the BlackBerry service was restored quickly and that no messages were lost during the outage, which started at about 3:30 p.m. EST on Monday.

"RIM continues to focus on providing industry-leading reliability in its products and services, and continues to invest in its infrastructure and processes," the company added in its statement. It concluded by apologizing to customers for any inconveniences that they experienced as a result of the problems.

Yesterday's outage was the second in less than a year that RIM blamed on an upgrade to the systems that support the BlackBerry service, which now has about 12 million subscribers.

Last April, the Waterloo, Ontario-based company said the flawed installation of cache optimization software led to a half-day outage, which was worsened by a failure to switch the service over to a backup system. At the time, RIM promised that it would bolster some of its testing, monitoring and recovery processes in an effort to prevent repeat episodes.

It wasn't clear how many users were affected by the latest snafu, although a Verizon Wireless spokesman said that only data transmissions were affected -- not phone calls. E-mail access appeared to be hampered for customers of all U.S. wireless carriers, including Verizon Wireless, AT&T Inc. and Sprint Nextel Corp., according to users and analysts.

Some users were perturbed about the second outage in 10 months, while others were more accepting.

"The fact it happens twice in 10 months is a clear indication that they are not taking it seriously," said David Maynor, CTO and founder of Errata Security Inc. in Atlanta. "This latest outage shows that RIM is lacking an attention to detail which can result in serious problems for their customer base."

Maynor said he has started carrying a BlackBerry 8310 and an iPhone, and found the iPhone worked during Monday's outage. He said the backup device is needed because telling a customer he couldn't respond to them because his BlackBerry service was down would be the same as a student telling a teacher, "The dog ate my homework."

John Halamka, CIO at CareGroup Healthcare System and Harvard Medical School in Boston, said the outage affected hundreds of BlackBerry users in the medical organizations for about four hours, lasting until 7:20 p.m. EST on Monday. RIM notified him of the outage via e-mail about an hour after it began, Halamka said, adding that the vendor estimated that about half of its subscribers were susceptible to the e-mail problems.

"Luckily, the outage was at the end of the day, so my users on the East Coast were not vocal about the outage," Halamka said. "We do depend on BlackBerry services for many mission-critical functions, so I hope there were lessons learned [by RIM] to prevent future outages."

Asked whether he might seek alternatives to that BlackBerry service, considering this was the second outage in less than a year, Halamka was sympathetic to RIM. "I know the angst those inside RIM are feeling now," he said. "I suspect this outage will be the catalyst that results in more redundancy, leading to fewer [and] shorter outages in the future."

Halamka said he targets 99.9% uptime for all of the systems he oversees, which allows only eight hours of downtime per year. He has a more exacting standard of 99.99% uptime for the most mission-critical systems. But he didn't say how many hours of outages would cause real concerns about the continued use of the BlackBerry service at CareGroup and Harvard Medical School.

Phillip Redman, an analyst at Gartner Inc., said it will be "critical to keep the highest service levels" possible for the service, especially as the number of BlackBerry users grows even further. If outages at RIM become a trend, they "will push users toward other options that don't have a single point of failure," Redman said.

Dulaney said companies that need critical messages to get through in less than eight hours should not trust RIM or any wireless e-mail system.

"We believe that any mission critical application for BlackBerry should be supplemented with a notification system," which an IT manager could set up through an existing carrier. To do otherwise is "foolhardy," he said.

Copyright © 2008 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon