Twitter, Facebook, the Library of Congress -- all of these institutions have mind-numbing amounts of structured and unstructured data that must be indexed and searched quickly. In Twitter's case, that's about 300 million new pieces of information to index every day.
So it's not surprising that such institutions would venture into the seemingly untamed world of open-source search applications, not just for the cost savings, but also for the ability to customize and modify applications quickly. Plus, open source has an active community that can help solve related problems.
But what about other enterprise users? Some 80% of the information in the typical enterprise is now unstructured, including texts, emails, blogs and videos, and that percentage is rising, according to Gartner. All of this data potentially holds value, and today every website is expected to query and produce relevant results as fast as the best Internet search engines. "People need search technology [in] virtually everything they do today. Everybody thinks search [capability] is going to be embedded in everything," says Whit Andrews, an analyst at Gartner.
Right now, most organizations have very constrained search capabilities, which are usually based on SQL queries or specific forms or reports. "That paradigm is soon going to break because the amount of data is just too big, and it's happening much too quickly in a 24/7 environment," he adds.
Enterprises of all types are starting to explore open-source search applications to get a glimpse into their collections of structured and unstructured data. One such product is Lucene Solr, an open-source search platform developed by Lucid Imagination, a San Mateo, Calif.-based software company.
Interest in open-source search applications began to take off three years ago. "That's when we saw creation of Lucid Imagination, which formed as a commercial support resource" for open-source software, says Greg Olson, senior director at Olliance Group, an open-source consulting firm and a unit of Black Duck Software. "That's a good indicator of mainstream demand for services or a solution around a raw technology like Lucene."
Make no mistake -- Lucene is for heavy hitters of search, Andrews says. "Lucene matters for people who need a very sophisticated search offering or product. Its typical [user] is a vendor that needs enormous scale in its application of technologies. It's a great place to use Lucene -- you need to be able to search a bazillion things. Where you don't see Lucene used is when an intranet needs a search by next Thursday."
A few other players offer lighter-weight search tools based on the same Lucene open-source technology. For instance, online retailer Zappos.com uses Lucene Solr to power its 63 million customer inquiries each month. But internally, the company deploys open-source search engine Elasticsearch, for "non-website-critical systems or non-performance-bound types of services," says Aye Thu, search team lead.
Many other search application vendors have recently been acquired by software giants, creating questions about their future direction. Microsoft acquired search software vendor Fast Search & Transfer in 2008 and made it consumable through SharePoint. In August 2011, HP acquired U.K-based Autonomy, and two months later Oracle announced plans to acquire Endeca, which provides unstructured data management, Web commerce and business intelligence solutions.
While none of these software giants has yet indicated that it will stop supporting its newly acquired search engine, "any time your tech provider is bought, it makes you nervous -- [especially] if you're another technology provider," Andrews says.
For now, that leaves Lucene Solr as the leading independent enterprise search platform. Lucid reports that 200,000 to 300,000 copies of Lucene Solr are downloaded every month.
EMC is using Lucene Solr to build a text analytics add-on for its relational database offering. "If you look at the enterprise search industry, most of the old-school players have either been acquired or gone by the wayside," says George Chitouras, senior director of research and development at EMC. "From my perspective, the technology with the most momentum behind it and the one maturing most quickly is the Lucene Solr technology."
While EMC hasn't moved open-source search capabilities inside its own enterprise yet, Chitouras says he sees myriad uses for the technology in almost any industry. "Any large company has use for information retrieval, whether it's doing call-center processing, customer relationship management, even innovation management," he says.
In mid-2011, Lucid Imagination released LucidWorks for the enterprise, a subscription-based, enterprise-ready package with support from experts in open-source search. Today, 100 enterprise customers use the product. Lucid also released a cloud-based, search-as-a-service version in February.
Lucid CEO Paul Doscher sees three types of needs driving organizations to use enterprise open-source search. First, "people want to use effective search to power their websites, but they don't want to be bothered with the infrastructure, management and maintenance of it," he says. LucidWorks connects to their websites, crawls the data and creates the response in the search box, "in a much higher capability than what they have right now," Doscher adds.
Second, large enterprises are turning to open-source search when they want to have a sandbox to develop prototype applications -- but don't have the developer expertise, infrastructure or hardware to do so.
Third, companies may embrace the open-source option if they're trying to extend the value of the data that they currently have. The search-as-a-service application is likely to appeal to these users, says Doscher. Similar to what Salesforce.com provides, Lucid's cloud application allows users to crawl information in their SaaS applications and then search it more effectively or integrate it with other information inside the enterprise or out on the Web. "You can use it as an application development platform to develop richer and more effective information applications," Doscher says.
Lucid's chief scientist, Grant Ingersoll, also sees some hybrid uses for open-source search. "You provision your own application internally in your data centers, but then you spill over excess capacity to the cloud-supported [version]," he says.
To stay ahead of competitors, Lucid Imagination plans to move into the business intelligence and data warehousing spaces and enable integration with big-data technologies, Doscher says. "If you put traditional data warehouse or business intelligence-type applications on top of Hadoop, in some instances, it's almost like trying to take this manhole cover of opportunity and shove it through a garden hose," he says. Applying open-source search technologies to these areas will alleviate the pressure built up from too much data and inadequate indexing and search capabilities.
The volume of information stored by enterprises going forward "is going to be scary," Doscher says. Open-source search technology will address this deluge.
"I believe that what Google has done for the Internet, technologies like ours will do for the enterprise by helping to consumerize information inside the enterprise," Doscher says. "Eventually, you will be able to have natural-language queries inside the enterprise that touch all the different databases, applications and ERPs that the enterprise runs. This will allow people to get instantaneous, real-time information that's consolidated and contextually relevant around the subject that they're interested in."
Collett is a Computerworld contributing writer. You can contact her at firstname.lastname@example.org.