Ads by TechWords

See your link here
Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Application/Web Development
Web Site Management
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
 

QuickStudy: Deep Web

December 19, 2005 12:00 PM ET

Computerworld - Definition: The deep Web, also called the invisible Web, refers to the mass of information that can be accessed via the World Wide Web but can't be indexed by traditional search engines -- often because it's locked up in databases and served up as dynamic pages in response to specific queries or searches.

Most writers these days do a significant part of their research using the World Wide Web, with the help of powerful search engines such as Google and Yahoo. There is so much information available that one could be forgiven for thinking that "everything" is accessible this way, but nothing could ber further from the truth. For example, as of August 2005, Google claimed to have indexed 8.2 billion Web pages and 2.1 billion images. That sounds impressive, but it's just the tip of the iceberg. Behold the deep Web.

According to Mike Bergman, chief technology officer at BrightPlanet Corp. in Sioux Falls, S.D., more than 500 times as much information as traditional search engines "know about" is available in the deep Web. This massive store of information is locked up inside databases from which Web pages are generated in response to specific queries. Although these dynamic pages have a unique URL address with which they can be retrieved again, they are not persistent or stored as static pages, nor are there links to them from other pages.

The deep Web also includes sites that require registration or otherwise restrict access to their pages, prohibiting search engines from browsing them and creating cached copies.

Let's recap how conventional search engines create their databases. Programs called spiders or Web crawlers start by reading pages from a starting list of Web sites. These spiders first read each page on a site, index all their content and add the words they find to the search engine's growing database. When a spider finds a hyperlink to another page, it adds that new link to the list of pages to be indexed. In time, the program reaches all linked pages, presuming that the search engine doesn't run out of time or storage space. These linked pages, reachable from other Web pages or sites, constitute what most of us use and refer to as the Internet or the Web. In fact, we have only scratched the surface, which is why this realm of information is often called the surface Web.

Why don't our search engines find the deeper information? For starters, let's consider a typical data store that an individual or enterprise has collected, containing books, texts, articles, images, laboratory results and various other kinds of data in diverse formats. Typically we access such databased information by means of a query or search -- we type in the subject or keyword we're looking for, the database retrieves the appropriate content, and we are shown a page of results to our query.



Additional Resources

Xerox
By using solid ink technology only from Xerox, you could save up to 65% by printing color for the cost of black and white. Enter for a chance to WIN a PhaserTM 8860 network color printer!
Microsoft
Save time and mitigate security risk. Deploy it now.
Sybase
In this white paper, IDC analyzes the role of next-generation mobile enterprise platforms as organizations seek a more strategic deployment of mobile solutions.

Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.

What People Are Saying

White Papers & Webcasts

2009 Gartner Magic Quadrant Report
Truly understand your options for WAN Optimization Controllers...  

Strategic ECM Webinar
Learn what new strategic business benefits can be realized through ECM!...

Tech Horizons: ASG's metaCMDB, The Technology That Rocks
Improved business productivity often requires more efficient IT and more efficient IT cannot be achieved without a better understanding of the way business...  

Managing And Protecting Your Ever Increasing Mobile Assets
Learn best practices for desktop and application virtualization, computer security, and computer life-cycle management....

The Vector Approach to Data Center Power Planning
This white paper describes an approach that considers the major milestones and thresholds in data center power requirements-and how planners should adjust their...  

5 Architecture Issues that Impact BES performance
This Live webinar will identify critical log file errors, performance counters, and configurations to pay close attention to when optimizing BES server performance....

Yankee Group Mobile WAN Optimization Report
Mobile work continues to evolve. Learn how to keep up with the demands of your organization's mobile workforce....  

Usability Is Everything
Learn what sets Workday's HR and Payroll solutions apart from the competition....

Mitigating Litigation Risk with Email Management Tools
Does your company have an email retention policy that protects it when litigation occurs? IDC discusses effective email retention policies and the role...  

The Value of Real SaaS at Workday
Cost savings, speed to value, and innovation brought to the enterprise by Workday's software-as-a-service solutions for HR and Payroll....