Blame Game

Application-monitoring tools could solve performance problems. Or they could just add finger-pointing to your troubles

About the only thing worse than poor application performance is being responsible for causing poor application performance.

That may be why performance-monitoring tools seem to have gone from being tools that detect problems to being tools that help place blame. And that could prove dangerous, experts say, because improperly used tools can confuse more than help and may themselves cause service degradations.

Service degradation cost an average large enterprise $1.4 million in productivity and $900,000 in lost revenue in 1998, according to Infonetics Research Inc. in San Jose. Last year's electronic-business explosion will only make things worse, says Michael McConnell, lead analyst.

And while more business is conducted on private and public networks, the networks and applications running on them are becoming larger and more complicated, McConnell says. That service degradation is almost certainly underreported. "If you're unable to get e-mail, you call and complain, but if it's a little slow, people are a bit more patient."

Fine-tuning enterprise performance has become critical. Performance-monitoring tools, which can quickly track down the cause of a slow response, have become mainstays on most corporate networks.

"Application performance is becoming especially important as companies start outsourcing their applications to application service providers (ASP)," says Gene Leganza, an analyst at Giga Information Group Inc. in Cambridge, Mass. "You need some sort of measurement to prove that service-level agreements are actually being delivered. If the ASP supplies the data, you have some legitimate questions about its validity. That's why independent performance monitoring is becoming so vital."

But "too often, (these) products are sold to let you deflect blame," says Jim McQuaid, director of monitoring solutions at Ganymede Software Inc. in Morrisville, N.C. "People are told, 'Buy this product, and you can produce data to prove your network or whatever is not what's at fault,' " he says.

"Lately, that's the marketing message that a lot of these vendors are delivering," says Leganza. "Some people see these tools as a nice, safe way to say, 'It's not me, go bother somebody else.' "

Ironically, these tools can also exacerbate performance problems. "Some types of monitoring tools are meant to run all the time, and they've been tuned to have minimal impact," Leganza explains. "Others put a much bigger load on network resources, but they're only meant to be used for a short period. If you run them continuously it can seriously degrade the network."

And without correlation and context, data on network and application performance is near meaningless, warns network analyst Bernie Davidovics at Predictive Systems Inc. in New York. Take a number like 90% server utilization. "Is that good or bad?" he asks. If the server is doing backup, the number represents a good use of resources. If it's doing online stock-trading transactions, it's treading too close to failure. You should focus on discovering end-to-end response time and availability from the user's viewpoint, a measurement that "has been the Holy Grail of network monitoring," says Davidovics. And although the lines between network and application-monitoring software are blurring, "no single approach delivers everything," he says.

"It's surprising how often these tools are just thrown on the network without much thought as to what's needed," says Leganza. "They collect too much data, or the wrong data," and could wind up adding another layer of bottlenecks.

"Vendors are certainly aware of what can happen, and they supply guidelines on how to minimize impact and maximize accuracy," says Leganza. "It sounds obvious, but the best advice I can give is 'Read the directions.' "

Performance Trackers

A tiny dot-com and a major insurance company faced radically different network problems: One needed to keep online customers satisfied -- by not waiting -- during the busy holiday season. The other needed to make sure his ASP delivered what it promised. Both used application-monitoring tools to resolve potential problems.

At 4-month-old Jewelry.com -- a 100% pure dot-com business -- Chairman and CEO Paul Rajewski says he marshaled his tiny El Segundo, Calif., staff to build a Web site to sell retail jewelry. They used e-commerce software from San Francisco-based Intershop Communications Inc. and linked it to the company's Sybase Inc. databases for a mid-November launch.

Jewelry.com "needed to make sure customers were getting good response time at our site," Rajewski says. But the start-up had minimal staff and resources to handle the crucial monitoring of its Web site, Internet and ASP.

Vendors and providers spent too much time shifting blame instead of providing numbers, Rajewski says. So "we got the guys at Candle (Corp.) to ping our site and give us reports on performance. Then we compared the data to the volume we were experiencing," he says.

Reports from Candle's CandleNet eBusiness Assurance Network 2000 correlating customer response time and usage to server performance showed the big problem wasn't hardware or software, but the Internet itself, says Rajewski.

Jewelry.com quickly switched providers, choosing San Francisco-based Digital Island Inc., which offers Web caching at multiple sites nationwide. The move solved many underlying problems, he says.

CandleNet relies on network, server and application-performance data collected by its ServiceMonitor Web application-monitoring software to supply technical and business process-oriented reports. Based on a Java applet, it activates when a user enters a Web address, measuring the time it takes for the data to get from user to Web server and back. Candle then analyzes and warehouses the data. Users access these performance reports, presented in Web format, through a virtual private network linked to a Candle Web site.

During the holiday shopping rush, CandleNet reported a linkage problem, Rajewski says, "and we tracked it down to a lost (SprintNet) router near our hosting site."

"The knee-jerk reaction you get when something goes wrong is, well, it's not us," Rajewski says. "If you can come in with real data that says, 'Yeah, it is you,' it changes the whole conversation, and your credibility goes through the roof."

Rajewski says he plans to expand reporting to include users' browse times and clickstreams as well as how long it takes each object on his company's site to download on the user's desktop. "That's going to be useful information. We're launching into Phase 2 of our operation," he says.

CIO James Barry oversees the complex enterprise systems at Insurance Holdings of America's Consumer Insurance Division in Beverly, Mass. Those include a 300-seat call center and 1,100 local and remote desktop users nationwide linked via an extranet at more than 100 sites, some no more than a booth at bulk retailer Sam's Club stores.

The complexity extended to the networks. "We have a significant IP infrastructure" as well as some asynchronous transfer mode (ATM), frame-relay and gigabit Ethernet networks, says Barry.

His staff -- six or seven people who cover three shifts to provide around-the-clock coverage -- was at its limit. Now they had a new problem: Monitor the performance of an insurance-selling application hosted by an ASP.

Stopping the finger-pointing was more difficult for Barry than for Rajewski. Barry chose the Vital monitoring package from International Network Services (INS) in Sunnyvale, Calif. His team deployed VitalAgent to all 1,100 desktops for network events monitoring. The 4,000-byte-size agents collect data and send it to a VitalConsole at headquarters in Beverly. Barry's team is also using the VitalHelp, VitalAnalysis and Enterprise Pro components.

But the company's "core business application is homegrown, and there was no tool on this planet that could monitor it," Barry says. Using VitalSuite's Transact tool kit, "we built hooks into our application so we cannot only monitor it, but also tweak performance," he says.

Barry credited INS's services team with a speedy implementation. "We allowed six or seven months for the rollout," he says. "We did it within 30 days." Barry's staff is down to three. The monitoring system replaced six people "sitting there staring at screens," he says.

Vital Suite tracks each transaction and event throughout the system, letting Barry's staff tune performance before it becomes a systemwide problem, he says. "If the same kind of thing happens with five users, it correlates the data and tells me about it.

"I took some grief from financial (on the VitalSuite implementation), until I explained the (return on investment)," Barry says. He says he's planning a convergence project involving voice over IP and ATM, a unified messaging platform and "maybe wireless."

Copyright © 2000 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon