Enterprises should ask a lot of questions when a vendor touts its business intelligence (BI) products as being fully integrated with Hadoop.
The hype around Hadoop has pushed many BI vendors to proclaim their support for big data technology without explaining what exactly that means, Forrester analyst Boris Evelson warned in a blog post today. As a result, IT managers should push their BI vendors for clarification before buying claims about Hadoop integration, Evelson said today.
"Hadoop is not a single entity, it's a conglomeration of multiple projects, each addressing a certain niche within the Hadoop ecosystem such as data access, data integration, DBMS, system management, reporting, analytics, data exploration and much, much more," he wrote.
Companies need to be aware of such distinctions and know what questions to ask when evaluating Hadoop/BI integration claims, Evelson said.
"If a company is using Hadoop and Big Data for the right reasons and they want to use a BI tool to do the analysis, the level of integration is important," Evelson added in comments to Computerworld via email.
Over the past two years, a growing number of companies have begun using open source and commercial versions of the Hadoop Distributed File System to store and organize huge amounts of unstructured data from the web. In addition to transactional data from CRM, ERP and general ledger systems, companies have also begun gathering a lot of new data from the web, from social media, micro-blogging sites such as Twitter and from machine sensors.
A lot of this new unstructured data has ended up in Hadoop systems, where it can be more easily organized and prepped for analysis.
The enterprise interest in Hadoop has spawned an entire ecosystem of vendors offering tools for accessing, extracting, searching, analyzing, visualizing and reporting on data in Hadoop big data systems. Many vendors also offer products for integrating Hadoop environments with incumbent relational database management systems.
When considering a BI tool for Hadoop environments, companies first need to know whether the tool works with both the community version of Hadoop as well as with commercial versions sold by vendors such as Cloudera and Hortonworks, Evelson said.
They also need to find what specific components of Hadoop the BI tool integrates with. Hadoop's myriad components include technologies such as Hive, Hbase, Pig and Sqoop, Evelson noted.
Also key is whether the BI product uses SQL or a SQL-like query language to interact with the Hadoop data, whether it can access NoSQL database management systems such as Hbase and Cassandra and whether it can explore HDFS data without a data model.
"You really need to peel a few layers of the onion before you can confirm that your BI vendor REALLY integrates with Hadoop," Evelson wrote.
Jaikumar Vijayan covers data security and privacy issues, financial services security and e-voting for Computerworld. Follow Jaikumar on Twitter at @jaivijayan or subscribe to Jaikumar's RSS feed . His e-mail address is firstname.lastname@example.org.