Massive data volumes making Hadoop hot

Complex data analytics requirements are driving interest in open source Hadoop technology, say users and analysts

1 2 Page 2
Page 2 of 2

According to Befus, Hadoop's architecture makes it ideal for running batch processing applications involving 'big data.'

Hadoop can be used for more real-time business intelligence applications as well.

Increasingly, companies like OpenLogic have begun using another open source technology called HBase on top of Hadoop to enable fast querying of the data in HDFS. HBase is a column-oriented Hadoop data store that enables real-time access and querying of the data in Hadoop.

OpenLogic offers enterprises a service for verifying that open source code is properly attributed and is in full compliance with open source licenses.

To deliver the service, OpenLogic maintains a comprehensive database of hundreds of thousands of open source packages. The company stores metadata, version numbers and revision histories is stored on a Hadoop cluster. The data is accessed via HBase.

Rod Cope, CTO of OpenLogic, said the company gets the best of both worlds with Hadoop. "A lot of the data we have won't fit into a RDBMS like MySQL and Oracle. So the best option out there is Hadoop," he said.

By running HBase on top of Hadoop, OpenLogic has also been able to enable real-time data access in nearly the same manner as conventional database technologies, he said.

There are some caveats associated with the use of Hadoop, users note.

"The biggest challenge is that this is still young technology with a lot of moving parts," Cope said. "You have to configure and install and integrate a number of components and get them working just so, and that's a non-trivial process."

The relative lack of Hadoop expertise among IT professionals has been another big problem, Befus said.

"It's hard to find anybody with any experience with Hadoop," he said. The fact that Hadoop is not quite a mature technology yet also means that companies need top notch operations staff to handle potential glitches.

Both OpenLogic and Tynt are using a Cloudera Hadoop support tools.

Cloudera offers technical support, implementation help, bug fixes and patches and other handholding services for Hadoop. It also offers a Cloudera distribution of the open source technology featuring core Apache Hadoop and nine related open source tools all integrated into one package.

Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld. Follow Jaikumar on Twitter at @jaivijayan, or subscribe to Jaikumar's RSS feed . His e-mail address is jvijayan@computerworld.com.

1 2 Page 2
Page 2 of 2
How to handle Windows 10 updates
Shop Tech Products at Amazon