August 29, 2005
(Computerworld)
Global oil company ConocoPhillips was all set to roll out SAP AG's Business Information Warehouse for users in Perth, Australia, but the early reports on the project weren't good. "This was a high-profile project, and some application owners were telling me that the network had a 15-second delay," says Dave Strobel, operations supervisor in ConocoPhillips' global information systems network operations group.
The application resided on servers in Bartlesville, Okla., where Strobel works, with a 2Mbit/sec. E1 connection to Perth. There was a demand for more bandwidth. But a new international circuit would have entailed a significant multiyear expense and required 60 workdays to set up, meaning it wouldn't be ready by the go-live date.
So Strobel handed the problem off to Bethesda, Md.-based Opnet Technologies Inc., which had been trying to sell him its network capacity planning software. Opnet did some packet captures and ran tests that took a total of 1,400 seconds to execute, far longer than they should have.
Opnet took the test results and modeled what would happen if the bandwidth to Australia was increased from 2Mbit/sec. to 20Mbit/sec. That cut only 0.38 seconds off the 1,400 seconds. Next, it modeled what would happen if the capacity was cut to 256Kbit/sec.; that added only 13 seconds to the transaction, a loss of less than 1%. "Bandwidth clearly wasn't the issue," says Strobel.
"When we looked at that data and analyzed it with Opnet, we found that very little of it was network delay," he says. "But we found a substantial amount of the delay was on the servers in Bartlesville."
The application team solved the problem with the servers, Conoco-Phillips didn't have to spend money on a multiyear contract for a multimegabit international pipe, and Opnet made the sale.
Capacity planningthe process of predicting IT needs, often with the help of softwarehas long been regarded as something of a black art. The feeling was that only a specialist with a degree in statistics could do it, and even then, the results were questionable.
But capacity planning tools are becoming easier to use, and companies are finding that they can help solve a wide range of long- and short-term bottlenecks. Tools now autodiscover network devices and connections, for instance. And pull-down menus allow for quicker configuring of models. Since those features shorten the time it takes to run scenarios and provide faster answers, the tools are being used to solve current problems, not just to estimate the upgrades that need to be included in next year's budget. As a result, the software is no longer shelfware.
"The big thing that's new in the area of capacity planning is that people are actually doing it," says Laura DiDio, an analyst at Yankee Group Research Inc. in Boston.
But as the tools have gotten better, the systems they need to model have gotten more complex. In many cases, users are no longer just trying to develop a utilization trend line for a single CPU or disk array. Instead, they might need to model a multitiered Web application to see whether the latency will come from the database, the application or the Web server. And virtualization raises its own challenges, since in that case, users aren't modeling against a set hardware configuration.
"As the environment becomes more virtualized and companies take on dynamic provisioning, it is a whole different story," says Audrey Rasmussen, a vice president at Enterprise Management Associates Inc. in Boulder, Colo. "The infrastructure will be morphing and changing so rapidly that a lot of capacity planning methods will become irrelevant because they can't keep up with the speed of change."
A service-oriented architecture adds its own wrinkle. "Then you don't have control over the infrastructure or even visibility into it," Rasmussen says, "particularly if you are subscribing to a service."
Getting the Data
The process of creating an accurate and useful capacity plan begins long before you start modeling the network or servers. First, companies need accurate information on what they have, how it's being utilized and how well it's performing. While this used to be a time-intensive manual task, it can now be done automatically, so capacity planners can just click on the items they want to include in a model. Similarly, gathering performance metrics can also be a routine, ongoing activity.
Pat Moffett is a consulting engineer for capacity planning at Norcross, Ga.-based CheckFree Corp., which provides electronic bill-paying services and software. He pulls performance data from nearly 500 Linux, Unix and Windows servers, as well as three IBM z/OS mainframes, into an IT resource management (ITRM) data warehouse from SAS Institute Inc. About 300GB of data is pulled in daily and incorporated into daily, weekly and monthly summaries. The database also contains business metrics and forecasts.
Moffett runs reports directly out of the ITRM and does some regression analysis using the SAS statistical procedures. But he also uses other tools for forecasting: For simpler calculations, he imports data into an Excel spreadsheet, and he uses modeling software from HyPerformix Inc. to simulate server capacity.
Getting the performance data is the easy part. Getting the business data takes a bit more skill because it involves coaxing data out of business unit executives rather than network devices. This includes business and service metrics, as well as expansion plans.
"The process of obtaining the business information is the biggest challenge," says Tom Hill, capacity planner at CNF Inc., a Palo Alto, Calif.-based shipping and supply chain management company that uses BMC Software Inc.'s Patrol on more than 60 servers. "At some point, you are going to have to justify your existence, and either you are supporting the business or you are not going to be around much longer."
Now and Forever
Once a data feed is arranged, the next decision is what to do with it. That requires an analysis of which resources are most important to keep running at optimum performance. For the government of Virginia's Fairfax County, the key element is the IBM mainframe running its custom financial, human resources, budgeting, procurement and record-keeping applications. Systems programmer Tom Rose uses Perfman for z/OS from The Information Systems Manager Inc. in Bethlehem, Pa., to model workloads on the mainframe's three logical partitions. He is also using it to determine which model to purchase to replace the county's decade-old machine. Rose is considering an IBM eServer zSeries 890 mainframe, but there are 27 models to pick from, and he doesn't want to get one that's too big or too small.
"We are using taxpayer money, so we have to make sure we get the right size of machine," says Rose. "One thing we have noticed is that we can purchase a machine with more processors, but certain workloads won't be any faster."
CenturyTel Inc., a voice and data services provider in Monroe, La., with more than 3 million customers in 22 states, has hundreds of servers in its data center. But it uses TeamQuest Corp.'s performance management capacity planning software on only about 20 of them. "We have a [Citrix Systems] MetaFrame with hundreds of users," says programmer John Barfoot. "We have so much fail-over in place that if we lost a server, it is not a big deal."
The servers CenturyTel does model are Unix boxes running its data warehouse as well as Amdocs Ltd. Ensemble customer service and billing software, SAP ERP applications and IBM Tivoli systems management software. Barfoot has used the TeamQuest software to cut the number of servers SAP was running on and to model the data warehouse to help the company decide whether to replace the server or simply upgrade the processors. But the biggest benefit came when CenturyTel was switching from a legacy billing system to Ensemble. Hundreds of thousands of customers were being migrated at a time, so Barfoot kept a close watch for potential problems.
"I modeled one of the servers and saw that a very significant bottleneck had fallen off the radar," says Barfoot. "We were able to get that corrected the day before the conversion, and one director commented that TeamQuest paid for itself in that one instance."
Cases like this give capacity planning greater credibility. No, these tools won't catch everything. But, like weather predictions, they're getting more accurate and reliable. Weather.com will give a far better prediction of tomorrow's precipitation than The Old Farmer's Almanac, and capacity-modeling tools give a better prediction than linear trending. They won't eliminate the need for keeping an umbrella in the car or some overhead in your server CPUs, but they greatly lessen the chance of getting caught in an unexpected rain or packet flood. Robb is a Computerworld contributing writer in Los Angeles. Contact him at drewrobb@attbi.com.
CAPACITY MANAGEMENT TECHNOLOGY SCORECARD
Approach:
Linear trending
Multivariate trending
Load testing
Multitier modeling
Multiworkload modeling
What it measures:
Utilization rate
Utilization rate
Response time
Response time
Response time
Best suited for
Individual hardware resource (server, disk)
Hardware resource
Application (test environment)
Application and infrastructure
Application and infrastructure
Prediction validity
Limited to the existing hardware configuration only
Limited to the existing hardware configuration only
Limited to the existing hardware configuration only
Existing infrastructure and projected future infrastructure
Existing infrastructure and projected future infrastructure
Response time/service-level prediction
No
No
Yes, but limited to tested infrastructure
Yes
Yes
Accuracy for predicting actual response time
Low
Low
Rough estimate of performance in production
High
High
"What if" analysis and predictions
No
No
Limited to varying load levels
Yes
Yes
Source: Enterprise Management Associates Inc., Boulder, Colo.