Skip the navigation

QuickStudy: Measuring Web Site Traffic

By Sharon Machlis
June 17, 2002 12:00 PM ET

Computerworld - In the beginning, there were hits. Today, hits are largely discredited as a measure of Web site traffic, since they count individual files served up. A single Web page can account for a dozen or more hits if it has a lot of photos, while a text-only page could generate just a single hit.

These days, the Weberati talk of metrics such as page views, ad impressions and unique users. But don't be fooled by precise-sounding terminology and numbers. There are so many ways to define and count Web visits that traffic measurement is as much an art as it is a science.

For example, what counts as a page view? Is it when a Web page is first requested? When content has completely finished loading? Or when a tracking pixel—a tiny file placed on a page specifically for counting page views—is called? Such distinctions are important to Internet ad buyers, because the numbers can differ depending on the definition used. Consider the impatient user who requests a page but then hits Back or surfs elsewhere before that page—and its ad—loads.

Search engines complicate the problem. Their automated software "robots" scour the Internet and index sites. IT staffs monitoring server load may need to factor in robot activity for capacity planning, but site operators must filter it out to get an accurate count of how many real people are visiting a site.

Finding and discarding activity of known robots such as Mountain View, Calif.-based Google Inc.'s is only one step in factoring out Web crawlers, notes George Ivie, executive director of New York-based Media Rating Council Inc., a trade organization seeking to develop and enforce audience measurement standards. Ideally, he says, analysis would also check for obvious automated activity, such as a visitor from the same IP address who clicks through 10 pages per second.

Trickier still are internal users. Should IT staffers at Seattle-based Amazon.com Inc. be counted as visitors if they're testing an updated part of the site? Probably not. But what about the receptionist who surfs to buy a gift?

Soft Numbers

One of the softest Web numbers is the tally of unique visitors per month. For sites that require registration and log-in, it's fairly simple. But the rest must depend on other devices, ranging from analyzing server log files to using cookies, which are small pieces of data stored in a user's browser that can be accessed by a Web site the next time that user visits.

Web site operators usually get information about site traffic from their own server logs, an outside online advertising company such as New York-based DoubleClick Inc., or a third-party rating service. Major sites typically use a combination of sources.

"We track all the page views internally that we get. We also double-check it with our ad server, DART," says Jim Candor, vice president for new media at AccuWeather Inc. in State College, Pa., which recently announced that it had surpassed 1 billion page views. DART is an ad-serving technology from DoubleClick that lets online staff set up when and where ads appear on a site; it also measures how many people view each ad.

Numbers from AccuWeather's server logs showed only a "slight discrepancy" with the DART figures, within a percentage point or two, Candor says. How did AccuWeather tally up 1 billion pages viewed over the site's history? "We track each type of page internally; we put a 1-by-1 spotlight [tracking] pixel internally," he says. The count began at the site's December 1997 launch.

In addition to using outside rating services, log file analysis is also quite useful, says Jeff Julian, president and publisher of IDG.net, a Computerworld.com sister site. It lets him see what people do after they arrive at a site. Server logs usually record each visitor's domain or IP address, browser type and files requested. Web site staff can then use commercial log analysis software or home-brewed code to sift through the raw data and pull together the statistics they're seeking.

Sites that don't require user registration use various techniques to estimate how many unique visitors—different individuals—are arriving each month. Some check to see whether there's an existing cookie; if not, the first-time visitor gets a cookie with a unique user ID. Then, if the user returns, the site knows he was there before.

The New York-based Interactive Advertising Bureau recently took a first crack at developing online audience measurement guidelines, issued in January. In them, the group defines visits and page impressions and presents proposals to deal with page caching and to filter out "nonhuman activity."

Ivie says that ultimately, he would like to see both internal Web sites and outside measurement agencies submit to external auditing, just as newspapers do for circulation claims. So far, he says, Atlanta-based CNN.com is the lone major consumer site that has submitted to Media Rating Council auditing.

"The major problem is there's no accountability in the Internet environment," Ivie says. "Our members struggle trying to figure out what numbers to rely on."

Reading Server Logs
This is a typical server log entry. Log analysis software is usually used to read each entry and come up with site usage statistics, but here’s how to read a raw log entry.

12.345.67.89 - - - - [14/Mar/2002:03:23:37 -0500] “GET /cwi/quickstudy/0,1070,NAV47-72,00.html HTTP/1.1” 200 2751 “http://www.idg.net” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)”
12.345.67.89: The visitor’s IP address (changed in this example to protect the user’s privacy).
- - - -: The user’s log-in (left blank for sites that don’t require log-in).
[14/Mar/2002:03:23:37 -0500]: Date and time the user arrived.
“GET /cwi/quickstudy/0,1070,NAV47-72,00.html HTTP/1.1”: Page the user requested.
200: Status code of the user request. Codes starting with a 2 mean the page was successfully retrieved. Codes starting with 4 indicate a problem, such as the “404 page not found” error.
2751: Number of bytes transferred.
“http://www.idg.net”: Referring Web address. This shows up if the user arrived at the requested page by clicking on a link from another page. This will be blank if the user typed in the Web address manually.
“Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)”: Information about the user’s browser and operating system.



See additional Computerworld QuickStudies

Read more about E-business in Computerworld's E-business Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

E-business White Papers
Smarter Commerce is redefining value chain visibility
Smarter Commerce is redefining the value chain in the age of the customer. It starts with putting the customer at the center of...
IBM Synchronizes its Commerce 2.0 Strategy with 'Smarter Commerce' Initiative
On March 14, IBM announced "Smarter Commerce", a strategic initiative that addresses the surging market for Commerce 2.0 solutions that take advantage of...
Proof Positive - Extended Validation SSL Increases Online Sales and Transactions
With the threat of identity theft and other types of fraud rampant on the internet, many consumers are reluctant to release their details,...
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
All E-business White Papers
E-business Webcasts
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
Virtualize Business-Critical Applications with Confidence
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere®...
All E-business Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs