Skip the navigation
News

New search engine takes 'DeepDyve' into the Dark Web

Engine offers eye into area of Web that traditional search engines can't penetrate

By Heather Havenstein
November 11, 2008 12:00 PM ET

Computerworld - DeepDyve Inc. today announced that it has launched a free search engine that can be used to access databases, scholarly journals, unstructured information and other data sources in the so-called "Deep Web" or "Dark Web," where traditional search technologies don't work.

The DeepDyve search engine enables searches of the Dark Web to more easily find life sciences, patent and Wikipedia data. The new engine indexes 500 million pages, said DeepDyve, which was known as Infovell before changing its name today.

The company said it will soon start indexing physical sciences content in the areas of information technology, clean technology and energy. That will help it meet its goal of expanding its index to more than 1 billion pages by the end of the year.

Because much of the content on the Deep Web is made up of technical publications, databases, scholarly publications and unstructured data, it has been difficult for traditional search engines to access it. To tackle this problem, DeepDyve is partnering with publishers and other providers of those sources of information to gain access to content overlooked by other engines, the company added.

Google Inc. announced earlier this month that it is ratcheting up its focus on the Dark Web by adding the ability to search PDF documents. In April, Google had announced that it was trying to find a way for its search engine to index HTML forms such as drop-down boxes and select menus that are typically part of the Dark Web.

"According to IDC, more than 42 million consumers spend 25 hours per month online researching business and personal information, and they are frustrated with the results they get back and the tools they have to use," said William Park, CEO of DeepDyve, in a statement. "DeepDyve gives information-savvy consumers unparalleled access to quality information found only in the Deep Web, with features and functionality that make it easy to find, filter and organize their results."

The company said that its technology is designed to allow users to type in a few words or copy an entire article into a query box to find all related articles located in the Deep Web.

Chris Sherman, a blogger at Search Engine Land, said that DeepDyve's approach to scouring the Deep Web is innovative. He credited the company's chief scientists, who are veteran genomics researchers. To crack the genetic codes contained in DNA sequences, he noted, researchers must understand the hidden patterns in of massive amounts of data.

"DeepDyve takes a similar approach to understanding information on the Web," Sherman added. "Going far beyond basic keyword-based search, DeepDyve indexes every word in a document, but also computes the factorial combination of words and phrases in the document and uses some industrial-strength statistical techniques to assess the 'informational impact' of these combinations. In essence, this approach looks at the meaning of an entire document and uses that to compute relevance, rather than factors like snippets of text or anchor text in links pointing to documents."

DeepDyve isn't a threat to Google now, nor is it likely to be in the future, but its tool is good for people who want to do research in areas that DeepDyve indexes.

"DeepDyve also offers a genuinely different 'second opinion' of the Web if you're wanting to look beyond the top results returned by Google and the other major search engines," Sherman noted. "With its limited initial offering, DeepDyve has just scratched the surface of what's available on the invisible Web, albeit in a very useful way. However, truly cracking the invisible Web problem still seems like a distant dream."

Read more about Web 2.0 and Web Apps in Computerworld's Web 2.0 and Web Apps Topic Center.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Web 2.0 and Web Apps White Papers
Why Business Ethernet Services?
Everybody's heard the cliché, "the network is your business." But that's not going to help you choose the best wide area networking service...
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
Top Solutions and Tools to Prevent Devastating Malware
Custom malware frequently goes undetected. According to Forrester Research, the best way to reduce risk of breach is to deploy file integrity monitoring...
Streamline Compliance and Increase ROI
Streamline, simplify, and automate compliance related activities; especially those that impact multiple business units. This white paper from NetIQ, outlines solutions that will...
All Web 2.0 and Web Apps White Papers
Web 2.0 and Web Apps Webcasts
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
Customer Spotlight: How IPC The Hospitalist Company Implemented Oracle on VMware
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn...
Virtualize Business-Critical Applications with Confidence
Virtualizing business-critical applications has become a key focus for organizations as they move along their virtualization journey. With the launch of VMware vSphere®...
All Web 2.0 and Web Apps Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs