Skip the navigation

QuickStudy: Deep Web

By Russell Kay, Russell Kay
December 19, 2005 12:00 PM ET

Computerworld - Definition: The deep Web, also called the invisible Web, refers to the mass of information that can be accessed via the World Wide Web but can't be indexed by traditional search engines -- often because it's locked up in databases and served up as dynamic pages in response to specific queries or searches.

Most writers these days do a significant part of their research using the World Wide Web, with the help of powerful search engines such as Google and Yahoo. There is so much information available that one could be forgiven for thinking that "everything" is accessible this way, but nothing could ber further from the truth. For example, as of August 2005, Google claimed to have indexed 8.2 billion Web pages and 2.1 billion images. That sounds impressive, but it's just the tip of the iceberg. Behold the deep Web.

According to Mike Bergman, chief technology officer at BrightPlanet Corp. in Sioux Falls, S.D., more than 500 times as much information as traditional search engines "know about" is available in the deep Web. This massive store of information is locked up inside databases from which Web pages are generated in response to specific queries. Although these dynamic pages have a unique URL address with which they can be retrieved again, they are not persistent or stored as static pages, nor are there links to them from other pages.

The deep Web also includes sites that require registration or otherwise restrict access to their pages, prohibiting search engines from browsing them and creating cached copies.

Let's recap how conventional search engines create their databases. Programs called spiders or Web crawlers start by reading pages from a starting list of Web sites. These spiders first read each page on a site, index all their content and add the words they find to the search engine's growing database. When a spider finds a hyperlink to another page, it adds that new link to the list of pages to be indexed. In time, the program reaches all linked pages, presuming that the search engine doesn't run out of time or storage space. These linked pages, reachable from other Web pages or sites, constitute what most of us use and refer to as the Internet or the Web. In fact, we have only scratched the surface, which is why this realm of information is often called the surface Web.

Why don't our search engines find the deeper information? For starters, let's consider a typical data store that an individual or enterprise has collected, containing books, texts, articles, images, laboratory results and various other kinds of data in diverse formats. Typically we access such databased information by means of a query or search -- we type in the subject or keyword we're looking for, the database retrieves the appropriate content, and we are shown a page of results to our query.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Networking White Papers
Digital Transformation: Creating New Business Models Where Digital Meets Physical
Individuals and businesses alike are embracing the digital revolution. Social networks and digital devices are being used to engage government, businesses and civil...
Make the Connection: Better Network Connectivity Drives Transformation
Network connectivity is more than just plumbing. Leading organizations today see high-performance network connectivity as a critical enabler of competitive advantage, and not...
Virtualizing Government Infrastructure
All server virtualization solutions are not created equal. The more-with-less agenda for government agencies is tailor-made for server virtualization, which is evolving into...
Moving Service Management to SaaS
Today, organizations can enjoy similarly substantial benefi ts by migrating their IT service management functions to a software-as-a-service model. This paper shows how...
Achieving 360 Degree Network Visibility with Nimsoft
360° network visibility is critical for ensuring continuous availability of networks, servers, and applications-anything less could
have costly bottom-line implications.
All Networking White Papers
Networking Webcasts
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Unified Communications 101
What's the best way to implement a unified communications solution for your organization?
Try the OptiView® XG on your network - FREE
The OptiView® XG is the first dedicated tablet with automated network and application analysis -- fastest way to root cause. XG raises the...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
All Networking Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs