Test Case

When an online brokerage's trading site goes down, it doesn't just risk alienating customers. According to the U.S. Securities and Exchange Commission, it might actually be liable for any damages customers suffer during the outage.

1pixclear.gif
1pixclear.gif
1pixclear.gif

Redesigning Ameritrade

Predictive Testing

Ameritrade used scripts written for Mercury Interactive's WinRunner to predict site performance under every possible type of user behavior. That's a far cry from the previous method, in which Ameritrade did its best development work, put the code on the live site and then tweaked it.

Focused Development For Specific Hardware

Ameritrade found that choosing the right hardware wasn't enough - it also had to tweak its applications to take best advantage of that hardware. For example, the company says, it got an eventual 700% increase in site performance from EMC disk storage subsystems only because it tweaked its applications to run on the new hardware.

Creating a Separate Test Team

To emphasize the importance of testing and to make sure enough time was allowed for it in the development process, Ameritrade pulled together testers from separate development groups into a new test team. The team could also enforce more consistent test methods.

1pixclear.gif

For Omaha-based Ameritrade Inc., that wasn't the only impulse to create a better site. Times have grown tougher since 1988, when it was the first brokerage to offer automated touch-tone telephone transactions, and even since 1996, when it became the world's first online-only brokerage.

By March of last year, Barron's Online, owned by Dow Jones & Co., awarded the Ameritrade site only two of four possible stars in a ranking of online brokerages. Ameritrade brought up the rear, in 21st place out of the 22 brokerages ranked. Accounting for that ranking were such things as the site's customer service, reliability, availability, price per trade and ease of use.

In contrast, today, on San Mateo, Calif.-based Keynote System Inc.'s consumer index of the top 40 most available sites, Ameritrade is ranked first. On the Online Broker Index, which tests brokers by doing multiple, varied transactions, Ameritrade is often ranked first for reliability and is often in the top two or three for performance.

Ameritrade is the fifth-largest U.S. online brokerage. Its turnaround was the result of doing predictive testing rather than reacting to failures, updating and testing hardware and software together and creating a separate testing organization and giving it more clout.

Since beginning those changes in March 1999, the company has invested more than $100 million to make them work.

Predictive Testing

Ameritrade's primary goal is to make sure its customers can complete trades. Before the makeover, Ameritrade had tried to achieve that by coding its applications, selecting hardware, trying to fine-tune the code for the hardware and then putting it all on the live Web site. But that was the equivalent of flying blind; Ameritrade wouldn't know where any obstacles were until the software ran into them. That's because a system can work fine at 2,491 transaction/ sec., then fall apart at 2,492 transaction/sec.

But with lost orders, defecting customers and the threat of litigation, Ameritrade chose to spend its money ensuring that code runs really well before deploying it rather than just doing autopsies of dead sites.

Eliminating those kinds of failures takes sophisticated performance- and load-testing software that can simulate the momentary, crippling peaks Web sites experience. Ameritrade selected Sunnyvale, Calif.-based Mercury Interactive Corp.'s LoadRunner, for which it already had a license after hiring Computer Sciences Corp. in El Segundo, Calif., to analyze of its Web site two years ago.

Furthermore, Jerry Johnston, Ameritrade's director of quality, had performed a "bake-off" among load-testing products from Segue Software Inc. in Lexington, Mass., Compuware Corp. in Farmington Hills, Mich., and Mercury Interactive at his previous company, USF&G Corp., and ultimately chose Mercury.

Ameritrade writes scripts using WinRunner that test user-order behavior on the site. "We're in the process of creating just about every conceivable type of order that can be made to develop an automated regression test script using WinRunner that covers equities, options, mutual funds, complex options. There are literally thousands of combinations," says Johnston.

Such simulations allow Ameritrade to break a test version of its site repeatedly during testing, fix the problem and then load those changes onto the real site. It can also ensure that there aren't any hidden capacity issues or defects such as memory leaks, which are caused when a process doesn't release the memory it's using after it finishes. Eventually, if not stopped, such leaks will cause a system to crash.

"Those are typically very hard to diagnose and remedy. By having a full-blown performance environment that you can run well above product stress levels, we're able to do far better performance numbers," says James Ditmore, Ameritrade's CIO. In addition, frequent testing between software versions and site iterations gives Ameritrade a paper trail: It can compare site performance based on software version, versions of the site or the month.

When the site does break, whether in testing or on the live servers, the programmers have another new line of defense: regression testing. The benefits are simple: Since changing a piece of code can unintentionally break an application that was working fine before, regression testing allows developers to roll the application back to a state where it's working and then compare the two versions of code to see what went wrong.

"Often, developers and development groups are more focused on their release and fixes and enhancements that the release includes, so I'm interested in making sure that not only those enhancements work the way they need to, but that all the prior code in the prior release is still working the way it should," says Johnston.

Tweaking

The No. 1 myth of Web site scalability is that when things slow down, you can just pop in a new Web server. Not so, says Ditmore: Good hardware is important, but it isn't nearly the whole story. Ditmore estimates that 50% of the performance gains of the new site came from application and database engineering, which meant not only writing those applications but also testing and refining them continually until all possible bottlenecks were eliminated. Twenty-five percent came from doing the same refining process on the network, and just 25% resulted from hardware improvements.

What's critical to truly realizing the benefits of performance engineering, however, is integrating those three categories. For example, when Ameritrade switched from Sun Microsystems Inc. to EMC Corp. disk storage subsystems and saw transaction speeds increase 700% over eight months, it wasn't necessarily because the EMC hardware or other infrastructure was faster.

"Without having the application engineered so it had headroom, you could have put EMC in and wouldn't have seen the change," Ditmore says. Ameritrade spent nine months tuning its network and servers.

When upgrading its proprietary Ameritrade Order Management (AOM) database production environment and moving to an EMC disk subsystem, Ameritrade tested with LoadRunner. "We ran into things like poor performance on certain transaction types - worse than in the existing production environment - so we used that test as a way to tune that new environment, to maximize performance," says Johnston.

"For example, we were getting defunct processes on the system," he says, meaning that valuable computing power was being spent on processes that the current version of AOM no longer needed.

When Ameritrade finally upgraded its Sun servers, which had ably handled Ameritrade's previous three-year, 120% annual growth rate, to EMC disk subsystems and tweaked the applications and network for EMC, the numbers really shot up: Throughput on the trading system nearly doubled, increasing from 2,500 to 4,500 transaction/min. "Unix vendors' storage solutions don't match what EMC has to offer. Basically, EMC has tuned their stuff for much higher input/output loads," says Ditmore.

Furthermore, as a result of load testing, says Johnston, "the implementation was able to go in very smoothly compared to what would have happened without load testing. And it was transparent to customers."

Over the past year, Ameritrade has also been improving its network, making it fully switched rather than routed. This speeds throughput because the switches already know where traffic should go, whereas routers must first translate protocols and addresses. The difference "can take 20% to 30% off the networking time," says Ditmore.

The trading and clearing engines run off of two Sun E10000s grouped in a high-availability cluster. That configuration was the result of network engineering. "Because we have that redundancy, we've eliminated single points of failure," says Ditmore.

Inevitably, failures do happen, which is why Ameritrade has fully redundant systems. In its Omaha data center, Ameritrade has more than 30 Sun 4500 Web servers running Unix for the primary applications and some running Windows NT for the networking directory. Those servers work in tandem with Seattle-based F5 Network Inc.'s Big/ip, which sits between the network router and the server array and routes Web queries to the most available server.

To improve site reliability, Ameritrade also created an exact duplicate of all its systems at a data center in Kansas City, Mo.

A good game plan includes knowing what to do when errors occur. Ameritrade monitors every database, network and server. Over the past year, it also added alerts to see if a component is in trouble.

More recently, it added tool sets to do correlation across errors. For instance, if a circuit fails, it is likely that every component attached to that circuit will report errors as well. Error correlation tools get to the root of the problem more quickly. "It can take a 15-minute analysis and make it a less-than-15-second analysis to allow you to replace a card or change the network configuration immediately," says Ditmore.

Above all, he notes, he has a very well-trained operations staff that understands the systems and an engineering team that can provide emergency second-level support.

All of Ameritrade's efforts have paid off. According to Keynote Systems, transaction times are 700% faster than they were just eight months ago. Likewise, reliability has shot up, with Ameritrade being ranked by Keynote as one of the three most reliable sites. Recently, Money magazine ranked Ameritrade and Boston-based Fidelity Brokerage Services Inc. as the No. 1 online brokerages.

Culture Shift

There is one thing Ditmore won't do when updating the Web site, and that is what he derides as "the Microsoft approach": going live with a site or application and letting end users debug it for you. Besides the expense of fixing an application once it's live, there's the bad publicity and help desk inundation.

Before, testers were part of the various application development or Web groups. That was bad because when pushed for time, testing often lost out. Creating one group, which reports to Johnston, simply gave testing more clout. In addition, it allows Ameritrade to easily reallocate testing resources. "We can do cross-training, then utilize testing resources across different groups," says Johnston.

Now, Ameritrade is working to embed full regression testing and more thoroughly develop testing processes in the full production environment, so that actual production code - not an approximation or older version - is always being tested.

"We have a testing group, and the development teams typically hand off testing during system-level testing, after unit testing and as part of the software and hardware integration. So load testing is handled by the test group. We have a test group that's focused on it, but again, if the unit and regression test are completely aligned, then the stress and load tests are aligned in the new release," says Ditmore.

Why the emphasis on testing? It makes for a quality Web site. "We're trying to ensure that everyone understands the benefits of doing it right, as opposed to trying to add quality at the end of the factory line," says Ditmore.

Copyright © 2000 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon