Drowning in Unstructured Data

Robert L. Mitchell
 

March 21, 2005 (Computerworld) The year was 1989. A rather disorganized co-worker of mine had begun running a personal information manager and Lotus Magellan, a newfangled "disk navigation system" that combined fast search with a file viewer window. In his case, the programs didn't always help. His excitement at showing off how quickly he could find some arcane bit of information often faded into a plaintive, "Wait, uh, uh, it's in here somewhere ..."
His plea became an inside joke around the office, a mantra to be recited around the coffee machine. The best approach, I thought, was to organize or add structure to documents as they came into the system. If you didn't spend time upfront to organize your data, what could you expect but chaos? Garbage in equals garbage out.
I'm not laughing anymore. Sixteen years later, the trickle of data on that original multimegabyte desktop hard drive has become a multigigabyte torrent, with much of that content linked to other documents on the company's LAN, Web site and e-mail server, and the World Wide Web. Today, there is simply too much information to parse; the orderly processes I used to conscientiously tag, arrange and otherwise transform incoming data simply take too long. I am drowning in a sea of unstructured information.
Ironically, Magellan turns out to have been the harbinger of today's desktop search tools, which have come to my rescue. Programs Copernic and X1 Desktop Search (the latter, a descendant of Magellan, is the one I prefer) combine a full-text index of documents, e-mail messages and other content with a file preview pane, enabling the user to almost instantly locate and display desired information. Support for document type filters and Boolean notation allows fine-grained searches. Further, users can usually act on the file within the context of the application that created it. For example, within X1, an e-mail message in the search results window can be forwarded by clicking a button.
Desktop search tools are creeping onto corporate desktops, both because many are free and because the productivity benefits are potentially large for users with significant amounts of locally stored content. For IT organizations that want to support desktop search, however, the issues are a bit more complicated than simply adding a preferred desktop search engine to the standard system image.
For example, users can point desktop search tools at shared volumes on the network, including public folders, creating unexpected disk I/O and network traffic loads. Also, most products aren't smart enough to deal with shared storage when laptops are disconnected. Indexed content may be unindexed when users are on the road, only to be reindexed once again when the user returns.
Security policies also need to be set to determine who can index and view which files. And as a security vulnerability in Google's tool made clear last year , the products are still evolving.
Ultimately, however, users don't need a desktop search tool. What they need -- and what IT should deliver -- is an integrated system that allows searches of local, enterprise and Web-based content from within a single, seamless user interface. Right now, that's still a tall order.
First-generation enterprise search tools from desktop search vendors include a second, network-based search engine that sits on the corporate LAN and indexes shared folders and intranet content. A user's ability to view or search selected content is governed by policies and permissions the administrator has set using LDAP or Active Directory.
Coveo Solutions Inc. offers an enterprise search complement to its Copernic desktop search tool. However, users still must use a different interface for each resource. X1 Technologies Inc. is readying a similar tool for release this spring that it says will include a unified user interface. X1, which has partnered with Yahoo to give away a consumer version of X1 Desktop Search, could be among the first to deliver access to the search trinity of desktop, enterprise and Web content from within a single graphical user interface.
Desktop search vendors are also moving quickly beyond e-mail to support content management software. Coveo is rolling out a version of its product for Microsoft's SharePoint; X1 has similar plans. Meanwhile, established enterprise search vendors such as Autonomy Corp. have launched their own products for the desktop market. If you use enterprise search already, your vendor is probably the first place to look for desktop search.
But do get started. Although the products aren't perfect, the productivity benefits of desktop search are too irresistible for users to ignore. If you don't start establishing a corporate IT standard for desktop search soon, you may find that your users have done it for you.
Robert L. Mitchell is Computerworld's senior features editor. Contact him at robert_mitchell@computerworld.com.