Sidebar: Also Known As ...

Over the past decade, the terminology used to describe Web harvesting has undergone several changes. In 1996, researcher Oren Etzioni wrote a paper called "The World Wide Web: Quagmire or Gold Mine?" which was published in the journal Communications of the ACM. Etzioni defined Web mining as the use of data mining techniques to automatically discover and extract information from Web documents and services.

In the late 1990s, Richard Hackathorn coined the term Web farming to describe a discipline combining aspects of data warehousing, Web data mining and knowledge-base creation.

Around the turn of the millennium, Web harvesting began to replace Web mining as the fashionable buzzphrase, although it can mean different things to different people. Web harvesting can be synonymous with Web mining, Web farming and Web scraping, but it can have other meanings as well. One widespread usage of the term refers specifically to the searching of Web pages for e-mail addresses for resale and use in commercial solicitations (i.e., spam).

The Web site of the Medical University of South Carolina defines Web harvesting as "the process of downloading RSS feeds and consolidating them for display." (Read our QuickStudy on RSS at QuickLink 46266.)

Another related term is Web scraping, an obvious derivation from the 1980s catchphrase "screen scraping," where PC- or mini-based applications accessing mainframe systems emulated 3270 or VT100 terminals. Such applications were quick and cheap but not always reliable. Similarly, Web scraping applications process a Web page's HTML to extract meaningful data, often from live data feeds or by manipulating specific applications. Web scrapers are also cheap and useful but of questionable reliability.

Copyright © 2004 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon