The growing need for companies to manage surging volumes of structured and unstructured data is continuing to propel enterprise use of open-source Apache Hadoop software.
But instead of replacing existing technologies, Hadoop appears to be working alongside conventional relational database management systems (RDBMS), according to a Ventana Research report released late last month.
Hadoop is designed to help companies manage and process petabytes of data. The technology's appeal lies in its ability to break up very large data sets into smaller data blocks that are then distributed across a cluster of commodity hardware for faster processing.
Early adopters, including Facebook, Amazon, eBay and Yahoo, use Hadoop to analyze petabytes of unstructured data that conventional RDBMS setups couldn't handle easily. Ventana's report, based on a survey of more than 160 companies, shows that a growing number of businesses have begun putting Hadoop to use for similar purposes.
The survey found that most of those companies are using Hadoop to collect and analyze huge volumes of unstructured and machine-generated information, such as log and event data, search-engine results and content from social media sites, said David Menninger, author of the Ventana report.
"In two-thirds of the cases, we found that people are using Hadoop for advanced analytics and for types of analysis that they were not doing before," he said.
The technology is much less likely to be used for analyzing conventional structured data such as transaction data, customer information and call records, where traditional RDBMS tools still appear to have an edge, Menninger said.
Despite Hadoop's early promise, the study said, enterprises that use it still face challenges related to issues such as security, clustering and a shortage of people with Hadoop skills.
This version of this story was originally published in Computerworld's print edition. It was adapted from an article that appeared earlier on Computerworld.com.