Building a Web site for 30 million customers

When IBM won a coveted Web technology contract with San Jose-based eBay Inc., IBM credited its High Volume Web Sites (HVWS) organization with clinching the deal (see story). Working directly with customers, the HVWS team helps plan, develop, test and deploy large-scale Web sites. Its patent-pending HVWS Simulator evaluates how a particular Web site architecture will respond to increased traffic, extreme growth and other scenarios.

Computerworld recently talked with Dr. Willy Chiu, vice president of IBM's HVWS organization in San Jose, about scalability issues for high-volume Web sites.

Q: How do you define a high-volume Web site?
When we first started about three years ago, I was using 1 million hits per hour as [the definition of] high-volume. We were trying to help customers manage the scalability of their Web sites and deal with getting inundated. Now the definition might be 5 million or 10 million registered customers -- or 30 million customers, in the case of eBay. In terms of the number of hits, it varies.

Q: Must scalability be built in from the outset, or can it be added later?
I suggest that you first build your software using standards-based packages and build your applications in such a way that they're componentized. Then you can start small, say, on a couple of Intel servers, accessing a database that could be on a mainframe or Unix box. Once you have a standards-based, componentized design -- like separating your presentation logic from your business logic from your data access layers -- you have enormous flexibility to be able to scale up. So you can start small and grow and map it to different infrastructures. If you don't start with a standards-based, componentized architecture, you're going to be forced to rewrite.

Q: So you have to start over?
There are two approaches to that. One, of course, is to rewrite. The second approach is to use tools to leverage your legacy applications. The beauty of having a standards-based approach is that you can "wrapper" these applications into a standard interface. Web services is an example. Or you can use Java Messaging Services, so you can communicate with the legacy application through a standard, and therefore, you build your new application with standards-based componentry but still leave your older applications in the legacy form, but "wrappered" and integrated into the new application.

Q: Can you describe the HVWS Simulator?
It's a mathematical simulation of e-business infrastructure, like Web servers, application servers and database servers. For example, when you're doing capacity planning, how many of those servers do you need? And how many CPUs per server do you need, and how would you do load-balancing, based on the number of users you have and the number of transactions? Our simulator is also built on actual customer experience. We have workloads that can represent online shopping or online reading or online banking. So you can try out different ways of configuring [the Web infrastructure] based on different workloads. You can ask what-if questions about the server configuration and find out the best practices, the throughput you can expect and the response time for concurrent users.

Q: How do you use it?
We have a Web browser interface where you provide the input, such as the configuration, firewalls, different types of workload. You push a button, and it will plot a graph of response time vs. the number of users. You look at where the "knee" of the curve is -- say, 5 million hits per hour -- and figure out how to configure it for growth up to 10 million hits per hour.

Q: What's the typical user reaction?
There are two common experiences. One is: "This is really cool." Because the tool allows you to do these what-if questions, like a spreadsheet allows you to try different financial models. The second thing that's useful is that you can ask questions about consolidation. Some customers run 1,000 servers. But they can ask what would happen if they have a more powerful server and [thus] have fewer servers. The benefit of a single large server is that it can handle the workload and run at a higher utilization -- in some cases, you have a lower total cost of ownership.

Q: Can it be used to simulate non-IBM servers?
It can represent the full gamut of IBM servers, but it also handles Sun Solaris machines.

Special Report

E-Commerce Grows Up

Stories in this report:

Copyright © 2002 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon