Web Data: Cached or Current?

Caching Web site content can speed performance but make managing dynamic updates exceedingly difficult.

For all of its virtual connotations, the Internet depends entirely upon its physical infrastructure to move information around. And the physical distance from server to end user leaves plenty of time for information, in the form of packets, to get lost, resulting in e-mails that never arrive, Web pages that load incompletely and streaming audio or video that pops, flickers or just dies. So getting files closer to end users can improve performance.

One way to do that is by caching files near the edge of the network, closer to users. Barry Weber, vice president of technical infrastructure at BarnesandNoble.com Inc. in New York, says the company's BN.com site saw a 50% improvement in performance from the end users' perspective after it started using caching in February last year.

Within the past few years, more companies have embraced caching as a way to push static content out to users, frequently outsourcing the content to external content delivery networks (CDN). CDNs are groups of Web servers and caching servers, which are simpler and less expensive than Web servers but also aren't able to generate dynamic content.

Companies are increasingly turning to CDNs because they can deliver static content more reliably than the prevailing model of a few clusters of Web servers serving every request. BN.com outsources delivery of its static content to Akamai Technologies Inc. in Cambridge, Mass. After BN.com uploads new content to one of Akamai's servers, it takes two to three hours for it to become available across Akamai's CDN. The CDN intercepts all IP requests for BN.com's static content - HTML, images, streaming audio or video - and serves it to users from the available cache that's physically closest to the user.

Meanwhile, requests for dynamic content, such as book inventory levels and targeted banner advertisements, go to BN.com's servers as usual. Both find their way back to the end user, who sees only the finished Web page. Though CDNs are unnecessary on a small scale, the CDN helps keep the site running quickly when, say, a new Stephen King novel comes out and thousands of users are viewing the book's Web page on BN.com every hour.

Now, for the first time, caching is enabling companies to do things that were previously impossible or very unreliable on the Internet, such as streaming catalogs of media files. But caching still leaves something to be desired for retail companies, such as Barnesandnoble.com, that dynamically generate their Web pages with content specifically targeted at individuals.

Some companies have a financial imperative to make their video files reliably available on the Internet. And reliability has been elusive, especially as the number of simultaneous streams has increased.

"If you're throwing these giant streaming files around your worldwide network, capacity becomes an issue very quickly," says Greg Howard, an analyst at HTRC Group LLC in Stockton, Calif.

But caching, he says, "can dramatically reduce costs for streaming, mainly in the areas of maintaining wide-area network capacity." Just as CDNs can put static files closer to end users, so, too, can they keep copies of streaming media files, serving multiple users from multiple locations rather than from just the few centralized streaming servers many companies use.

Take, for example, Coastal Training Technologies Inc. in Virginia Beach, Va., which sells safety and training videos on topics ranging from blood-borne pathogens to oxyfuel welding.

Before customers buy, they want to preview the videos, which can cost up to $800 each. In the past, Coastal would mail out bunches of preview tapes. But it could take weeks for customers to review them, which made it difficult to close sales with follow-up calls.

Coastal wanted to make decent previews available online but didn't want to have to run Web servers to house the thousands of necessary preview files. After attending the Streaming Media East conference in New York last summer, the company decided to outsource the delivery of its previews to a CDN.

Coastal chose Digital Island Inc. in San Francisco after also evaluating service from Activate Corp., Akamai, Burst.com Inc., Globix Corp. and iBeam Broadcasting Corp. Choosing Digital Island over Akamai was practically "a flip of the coin," says Mark Stelbauer, Coastal's director of e-business.

Coastal uses 150K Advanced Streaming Format files. (To see a preview, click here. Note: registration required.) The company uploads 50 or 100 files at a time via file transfer protocol to a Digital Island server, and within a few hours, the files are propagated across the CDN. Unlike many other CDNs, which cache content based solely on popularity, Digital Island also maintains many copies of Coastal videos on several different servers.

"Since we're not targeting the consumer, the files are not going to be requested every 15 seconds. For us, it's maybe every 15 or 20 minutes," says Stelbauer. Thus, a popularity-based model wouldn't work there.

Coastal wouldn't specify how many users previewed videos exclusively online but did say that once the figure reaches 20% to 30% of overall users, it will make an impact in the bottom line. Already, however, salespeople are able to call just hours after previews are viewed online, which has helped sales.

Though current hardware and software makes it possible for companies to build their own CDNs, HTRC's Howard cautions against it. "People who are building their own CDNs are finding it too difficult or not cost-effective when you include the cost of labor. It just makes sense to go to the service providers in this market," he says. A company such as CDN outsourcer Akamai has 9,700 servers configured in 650 networks across 56 countries, a scale that few do-it-yourselfers would be able to match.

Pricing for outsourced CDNs wasn't available for this article; CDN vendors wouldn't release the information, and the customers interviewed were contractually prohibited from discussing it.

Users say that, in general, vendors divulge little information, making it difficult to compare them when shopping for a CDN. But there are other ways to evaluate CDNs, namely by their performance. That's what BN.com did in February last year, when it pitted its top three CDN choices (which it declined to name) against one another, watching as each hosted the static content on the BN.com site simultaneously. "It's pretty fascinating, because we really had the statistics," says Weber. The company chose Akamai.

Beyond the Static

Caching can speed the delivery of content, but to date, it has only been good for static Web content, not dynamic information such as pricing. Weber says that's the way it has to be for now, given current cache limitations. "I'd like to go beyond caching static content, as soon as possible," he says.

What holds him back, he says, are "distributed databases and distributed applications," which produce the dynamic information on a Web page that's tailored to individual users or which changes quickly. Caches can't handle that content well.

Caching dynamic content is "problematic from a database standpoint, because you need one version of the truth," says Peter Firstbrook, an analyst at Meta Group Inc. in Stamford, Conn. Companies need to be able to refresh the information across the CDN whenever a little change occurs, so there's just one version of it. So "you have to be able to delete pages from the cache when a certain event occurs, not just at a certain time," Firstbrook says.

At Outpost.com, the site of Cyberian Outpost Inc. in Kent, Conn., for instance, the dynamic information on any given Web page can include product information, real-time stock inventory, product categories and order-tracking information.

Even prices change moment by moment. "The average price can change six or 10 times per day on [a] product," says Raymond Karrenbauer, chief technology officer at Outpost.com. Every time new inventory lands in a warehouse, the e-commerce application adjusts pricing based on current inventory supply and customer demand levels.

Some industry initiatives are afoot to let companies push dynamic assembly onto the CDN to increase content delivery speed. One is the Edge Side Includes (ESI) open-standards specification, co-authored by Akamai, ATG Inc., BEA Systems Inc., Circadence Corp., Digital Island, IBM, Interwoven Inc., Oracle Corp. and Vignette Corp. The core of ESI is a series of XML tags that specify how and when information and pages should be assembled within the content management system, application server and CDN. To date, Oracle's 9i application server and Akamai's EdgeSuite infrastructure service support ESI.

Two newer companies are also forging into dynamic delivery territory. Software from SpiderCache Inc. in Vancouver, British Columbia, and Chutney Technologies Inc. in Atlanta can accelerate dynamic content delivery by using things such as event- or time-based expiration of caches, predictive modeling and real-time cache consistency checks.

But these are baby steps. "The Holy Grail is to move all this stuff out to the edges," says Firstbrook. "But the reality is, I don't think you'll be able to do that anytime soon."

Computerworld Online Exclusive
Additional Caching Resources

GOOD LINKS

The National JANET Web Cache Service

Advice for setting up your own Web cache server. Includes good lists of cache appliances and hardware.

Brian D. Davison's Web Caching and Content Delivery Resources

A thorough list of venders that offer cache outsourcing, hardware and software.

Edge Side Includes overview

News and information on the Edge Side Includes initiative to make content management software, and Web software and hardware work together to better serve dynamic content.

In the future, the protocol could be adapted to work across caches from various vendors.

Test Your Cacheability

Check Web pages to see how a Web cache would handle them.

Web Authoring For Caching Tips

A helpful tutorial giving an overview of caching as well as tips on designing Web pages that work with caches.

Survey of Proxy Cache Evaluation Techniques

A 1999 paper from Rutgers University's Brian Davison evaluating the performance of various caching techniques. Includes insightful overview of caching.

BUZZWORDS/GLOSSARY

Cache: A high-speed storage mechanism that can either reside on a chip (see Quickstudy) or as in the case of Web caches, on a hard drive.

Cache control headers: Part of the HTTP 1.1 specification, these HTTP headers can be added by content creators to any Web page to indicate when the page "expires" and should thus be flushed from any cache and reloaded. Content Delivery Networks (CDN) and large Internet service providers that cache popular Web pages often—but not always—obey these instructions.

Content accelerator: See "edge server."

Content Delivery Network: A collection of Web cache servers, distributed geographically, that manage the delivery of content across the network.

Edge server: A server that is as close to the edge of the network as possible. (The ultimate edge of the network is the end user.) Often placed at an Internet point of presence, an edge server is most often used to serve popular, static content.

Forward proxy mode: Caches hold data based upon historical user demand patterns. If demand for www.nytimes.com spikes every weekday starting at 8:35 a.m. EST, the cache servers can cache the nytimes.com site before then.

Reverse proxy cache: One server literally acts as proxy for another, intercepting all requests for it. Most Internet providers began caching popular Internet sites in order to more quickly serve the many simultaneous provider customer requests for the same page, such as large newspaper Web sites. Content makers were incensed that "old" versions of their Web sites were being presented to users. Now new initiatives are trying to create a standard markup language so that content creators can "clue in" these infrastructure caches as to what can be cached and how often it should be refreshed.

Web caching: The temporary storage of Web objects for later retrieval, Web caching can reduce bandwidth consumption and latency, while easing server load.

Related:

Copyright © 2001 IDG Communications, Inc.

Where does this document go — OneDrive for Business or SharePoint?
  
Shop Tech Products at Amazon