Moving beyond Hadoop for big data needs
Hadoop isn't enough anymore for enterprises that need new and faster ways to extract business value from massive datasets
Computerworld - Hadoop and MapReduce have long been mainstays of the big data movement, but some companies now need new and faster ways to extract business value from massive -- and constantly growing -- datasets.
While many large organizations are still turning to the open source Hadoop big data framework, its creator, Google, and others have already moved on to newer technologies.
The Apache Hadoop platform is an open source version of the Google File System and Google MapReduce technology. It was developed by the search engine giant to manage and process huge volumes of data on commodity hardware.
It's been a core part of the processing technology used by Google to crawl and index the Web.
Hundreds of enterprises have adopted Hadoop over the past three or so years to manage fast-growing volumes of structured, semi-structured and unstructured data.
The open source technology has proved to be a cheaper option than traditional enterprise data warehousing technologies for applications such as log and event data analysis, security event management, social media analytics and other applications involving petabyte-scale data sets.
Analysts note that some enterprises have started looking beyond Hadoop not because of limitations in the technology, but for the purposes it was designed.
Hadoop is built for handling batch-processing jobs where data is collected and processed in batches. Data in a Hadoop environment is broken up and stored in a cluster of highly distributed commodity servers or nodes.
In order to get a report from the data, users have to first write a job, submit it and wait for it to get distributed to all of the nodes and get processed.
While the Hadoop platform performs well, it's not fast enough for some key applications, says Curt Monash, a database and analytics expert and principal at Monash Research. For instance, Hadoop does not fare well in running interactive, ad hoc queries against large datasets, he said.
"Hadoop has trouble with is interactive responses," Monash said. "If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies."
Companies needing such capabilities are already looking beyond Hadoop for their big data analytics needs.
Google, in fact, started using an internally developed technology called Dremel some five years ago to interactively analyze or "query" massive amounts of log data generated by its thousands of servers around the world.
Google says the Dremel technology supports "interactive analysis of very large datasets over shared clusters of commodity machines."
The technology can run queries over trillion-row data tables in seconds and scales to thousands of CPUs and petabytes of data, and supports a SQL-query like language makes it easy for users to interact with data and to formulate ad hoc queries, Google says.
BI and analytics
- Brewer taps Bud Lab at University of Illinois
- Splunk woos Hadoop users
- RSA brings big data analytics to security threat management
- Moving beyond Hadoop for big data needs
- Q&A: What's needed to get a big data job?
- SAS extends analytics support for unstructured data
- Time has come for chief analytics officers
- Big data brings big academic opportunities
- Finding the business value in big data is a big problem
- IT-centric enterprise BI models unsustainable, says Forrester
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Who's afraid of the big (data) bad wolf? Survive the big data storm by getting ahead of integration and governance functional requirements This paper provides a detailed review of the best practices clients should consider before embarking on their big data integration projects.
- Understanding big data so you can act with confidence Automating information integration and governance and employing it at the point of data creation helps organizations boost confidence in their big data.
- Integrating and Governing Big Data The end-to-end information integration capabilities of IBM® InfoSphere® Information Server are designed to help organizations understand, cleanse, monitor, transform and deliver data-as well...
- The MDM advantage: Creating insight from big data To help enterprises create trusted insight as the volume, velocity and variety of data continue to explode, IBM offers several solutions designed to...
- Webinar: Building a Big Data solution that's production-ready Big data solutions are no longer just a nice-to-have.
- Big Data and Analytics will transform your business - what you don't know will hurt you! Feedback from over 23,000 actual Oracle big data and analytics (BDA) customers was analyzed by Solitaire Interglobal Ltd (SIL) and their findings are... All Big Data White Papers | Webcasts