Computerworld - Definition: The deep Web, also called the invisible Web, refers to the mass of information that can be accessed via the World Wide Web but can't be indexed by traditional search engines -- often because it's locked up in databases and served up as dynamic pages in response to specific queries or searches.
Most writers these days do a significant part of their research using the World Wide Web, with the help of powerful search engines such as Google and Yahoo. There is so much information available that one could be forgiven for thinking that "everything" is accessible this way, but nothing could ber further from the truth. For example, as of August 2005, Google claimed to have indexed 8.2 billion Web pages and 2.1 billion images. That sounds impressive, but it's just the tip of the iceberg. Behold the deep Web.
According to Mike Bergman, chief technology officer at BrightPlanet Corp. in Sioux Falls, S.D., more than 500 times as much information as traditional search engines "know about" is available in the deep Web. This massive store of information is locked up inside databases from which Web pages are generated in response to specific queries. Although these dynamic pages have a unique URL address with which they can be retrieved again, they are not persistent or stored as static pages, nor are there links to them from other pages.
The deep Web also includes sites that require registration or otherwise restrict access to their pages, prohibiting search engines from browsing them and creating cached copies.
Let's recap how conventional search engines create their databases. Programs called spiders or Web crawlers start by reading pages from a starting list of Web sites. These spiders first read each page on a site, index all their content and add the words they find to the search engine's growing database. When a spider finds a hyperlink to another page, it adds that new link to the list of pages to be indexed. In time, the program reaches all linked pages, presuming that the search engine doesn't run out of time or storage space. These linked pages, reachable from other Web pages or sites, constitute what most of us use and refer to as the Internet or the Web. In fact, we have only scratched the surface, which is why this realm of information is often called the surface Web.
Why don't our search engines find the deeper information? For starters, let's consider a typical data store that an individual or enterprise has collected, containing books, texts, articles, images, laboratory results and various other kinds of data in diverse formats. Typically we access such databased information by means of a query or search -- we type in the subject or keyword we're looking for, the database retrieves the appropriate content, and we are shown a page of results to our query.


- Excel 2010 Cheat Sheet
- Register for this Computerworld Insider Cheat Sheet and gain access to hundreds of premium content articles, guides, product reviews and more.
- Finding the right cloud solutions for your organization
- HP is driving the evolution of what we call the Instant-On Enterprise. It is an enterprise that embeds technology into everything it does...
- Converged Infrastructure for Dummies
- As you know, everything is mobile, connected, interactive, and immediate. This is exactly why organizations need a highly agile IT infrastructure in order...
- Seven Priorities for Integrated Network Management - How HP Intelligent Management Center Delivers an Enterprise-class Solution
- This white paper describes the major requirements for network management solutions to help the organizations become more profitable, efficient and reliable.
Intel and the... - Building Cloud-Optimized Data Center Networks white paper
- Enterprises are turning to the Cloud to improve business agility, reduce expenses and accelerate business innovation. Cloud computing redefines the way IT assets...
- Gartner on the Network Infrastructure Market
- The network infrastructure market has evolved rapidly, from one in which most organizations adhered to a single-vendor architecture to a more business-driven network... All Networking White Papers
- The Higher-Bandwidth, Lower-Cost Connection of Choice: 10GBASE-T LAN on Motherboard
- Learn how Expedient, a cloud provider, is using 10 Gigabit Ethernet to boost its services and rein in costs.
- Distributed Database Security with Real-time Monitoring
- View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
- InfoSphere Warehouse Packs Demo
- These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
- Delivery Management -- Extending Lifecycle Management
- Date: Wednesday, June 20, 2012, 1:00 PM EDT
Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,... - Leverage automation today to reduce IT complexity
- Date: Tuesday, June 5, 2012, 2:00 PM EDT
Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
All Networking Webcasts