Narrowing the search

Companies are looking for search products tailored to their needs

To most end users and even many IT managers, the term enterprise search implies little more than a keyword text box on the company Web site, as opposed to one on the public Internet. Five years ago, that was basically correct. Today, however, search is a lot more than keywords.

"There's a whole spectrum of technologies that fall under the big umbrella called 'search,' " says Hadley Reynolds, a research analyst at Delphi Group. "They range from simple keyword search to taxonomy classification and categorization to text analytics."

For instance, researchers at the Stanford Linear Accelerator Center (SLAC) in Menlo Park, Calif., recently needed a search tool to help them index and navigate an internal newsgroup with 600-plus posts per day. They needed a tool that was customizable and capable of handling the large volume of posted messages. After evaluating a number of commercial and open-source search products, SLAC chose the open-source Swish-e tool, both for its speed and low cost.

"It doesn't do all that Google does ... but it turned out to be the fastest index engine," says Douglas Smith, an experimental support professional at SLAC. Smith notes that internal search requires different capabilities than public Internet search, where users don't know anything about the content they're searching.

"For indexing libraries, catalogs, help texts, source code repositories, newsgroups, etc., where the source is known, you want to rank things by the content rather than by comparison of the source links. And that's what Swish-e offers," Smith explains.

SLAC's use of Swish-e is a basic application of search technology in the enterprise. Higher-end search tools, however, offer a more diverse range of features and functions.

Enterprise search applications all start with the ability to search unstructured content, such as PDF files, Word documents, Web pages and other information not contained in a relational database. They include a search engine and are able to rank results by relevance. And most also provide a way to customize both the results ranking and the indexing process, enabling organizations to place greater weight on characteristics of importance to them, such as the source or type of content.

SLAC also uses Verity Inc.'s Ultraseek search engine to index and provide access to SLAC's 500,000 pages of research, administrative information and other HTML and PDF content. The ability to customize the rules for indexing content sped up the process, says Web information manager Ruth McDunn. "It used to take a month to update the collection. Now we can update every day or, at most, every week," she says.

Ultraseek, like many enterprise search products today, allows users to customize the indexing criteria to make some factors more important than others. For instance, McDunn says she can configure it to skip "black holes" of content that might slow down the indexing process. "We have some systems that churn out tens of thousands of pages a day that don't really need to be indexed, but the spider can get stuck doing those same pages over and over." The Ultraseek indexing "spider" also recognizes patterns and is able to skip redundant or irrelevant content.

Beyond the Text Box

Besides customization capabilities, search tools may provide industry-specific taxonomies -- content-classification schemes -- as well as the ability to scan content and generate a taxonomy. They often provide categorized results, such as "press releases" or "product documentation." Some offer summarization capabilities for scanning content and automatically generating a summary.

Other tools offer profiling, the ability to examine searchers' behavior and point them to potentially related content, or federation, which consolidates results from other search engines into one list -- helpful for organizations with search engines embedded in existing applications. Still others provide behavior analytics to help an organization track a range of metrics on searchers' behaviors and the accuracy of their results.

Enterprise search is one of those areas where each implementation is truly unique, says Susan Feldman, an analyst at IDC. "There are lots of ways you can search within an organization," she says.

Given the enormous range in the types of content organizations have and in the ways in which they need to slice and parse all of that content, it's not surprising that the enterprise search market itself has become so diversified. Some of the more recent capabilities that vendors have added to their search software enable organizations to use search for a variety of sophisticated applications.

Behavioral analytics. Increasingly, vendors are adding analysis tools to help organizations discover what, how and why people are searching, and then to refine the search algorithms to produce more accurate results. At People's Bank in Bridgeport, Conn., Web site managers used a search analysis tool to figure out why only 40% of customer searches produced accurate results.

Ross Jenkins, the bank's senior information architect, used Mondosoft AS's BehaviorTracking tool to analyze why people weren't finding what they wanted. "It tells you the types of keywords customers use, possible synonyms, whether they're searching and stopping or continuing to search, and a range of other indicators," he says.

Jenkins also leverages the metrics provided by BehaviorTracking to find out which searches were most likely to lead to a customer "conversion," such as opening an account or applying for a mortgage. For those high-value searches, he creates special "landing pages" that are prominently displayed in the list of results.

"We've optimized the search-engine technology to such an extent that when people search, they find -- and when they find, they convert," says Jenkins.

Aggregation of multimedia content. Search isn't just for data and text, of course. As Tim Hardy, chief technology officer at World Book Inc., publisher of The World Book Encyclopedia, can tell you, there's a multitude of formats out there that a searcher may need to access. That was the situation confronting the IT staff when they sought a search tool for World Book's Web encyclopedia. The encyclopedia provides access not only to World Book's 25,400 articles and 248,000 definitions, but also to 9,300 audio clips, 1,480 maps, 128 photographs and 115 video clips.

Today, Chicago-based World Book uses Endeca Technologies Inc.'s ProFind and XML metadata to create a unified index of materials. "We have an XML database for each of the different content types that provides the indexable data. Now we're able to place all the differing content types into one integrated index," says Hardy.

Structured and unstructured data. Many organizations need a search environment that can access both unstructured data and the information in relational databases. Such is the case at ThomasNet.com, a business-to-business Web site launched by New York-based Thomas Publishing Co. in August for industrial manufacturing companies.

ThomasNet enables purchasing managers and others to search for vendors and products that meet their needs. ThomasNet uses Fast Search & Transfer ASA's Enterprise Search Platform and AdVisor products to combine unstructured content from vendor Web sites with structured data from its own database of 650,000 company listings. AdVisor returns product information from Web sites and matches them against ThomasNet's taxonomy, then combines that content with information from the database.

"We've indexed a collection of industrial Web sites with metadata and overlaid it with our taxonomy of 64,000 categories," says Monica Lavin, ThomasNet's executive director for Web initiatives.

Text analytics. Also called text mining, this approach to enterprise search is rapidly growing in importance. Aimed at ferreting out concepts in unstructured content, text analytics tools parse content into nouns, verbs and adjectives and analyze them to determine the topic and its context, thus enabling more precise searches. Sold by vendors such as ClearForest Corp. in Waltham, Mass., and Attensity Corp. in Palo Alto, Calif., text analytics products are gaining popularity.

For instance, Electronic Data Systems Corp. in Plano, Texas, uses text analytics to find relevant information in its supply chain and procurement systems for comparing the refund and discount clauses in various vendors' contracts and other purposes.

"The rate of growth of unstructured data is exponential, while our ability to manually read documents is static," says Kas Kasravi, a fellow at EDS, which has designed text analytics solutions for customers using the ClearForest product. "And that means that we don't have the information to make the best decisions ... and that's where text mining comes in."

Hildreth is a freelance writer in Waltham, Mass. She can be reached at Sue.Hildreth@comcast.net.

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon