Moving beyond Hadoop for big data needs
Hadoop isn't enough anymore for enterprises that need new and faster ways to extract business value from massive datasets
Computerworld - Hadoop and MapReduce have long been mainstays of the big data movement, but some companies now need new and faster ways to extract business value from massive -- and constantly growing -- datasets.
While many large organizations are still turning to the open source Hadoop big data framework, its creator, Google, and others have already moved on to newer technologies.
The Apache Hadoop platform is an open source version of the Google File System and Google MapReduce technology. It was developed by the search engine giant to manage and process huge volumes of data on commodity hardware.
It's been a core part of the processing technology used by Google to crawl and index the Web.
Hundreds of enterprises have adopted Hadoop over the past three or so years to manage fast-growing volumes of structured, semi-structured and unstructured data.
The open source technology has proved to be a cheaper option than traditional enterprise data warehousing technologies for applications such as log and event data analysis, security event management, social media analytics and other applications involving petabyte-scale data sets.
Analysts note that some enterprises have started looking beyond Hadoop not because of limitations in the technology, but for the purposes it was designed.
Hadoop is built for handling batch-processing jobs where data is collected and processed in batches. Data in a Hadoop environment is broken up and stored in a cluster of highly distributed commodity servers or nodes.
In order to get a report from the data, users have to first write a job, submit it and wait for it to get distributed to all of the nodes and get processed.
While the Hadoop platform performs well, it's not fast enough for some key applications, says Curt Monash, a database and analytics expert and principal at Monash Research. For instance, Hadoop does not fare well in running interactive, ad hoc queries against large datasets, he said.
"Hadoop has trouble with is interactive responses," Monash said. "If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies."
Companies needing such capabilities are already looking beyond Hadoop for their big data analytics needs.
Google, in fact, started using an internally developed technology called Dremel some five years ago to interactively analyze or "query" massive amounts of log data generated by its thousands of servers around the world.
Google says the Dremel technology supports "interactive analysis of very large datasets over shared clusters of commodity machines."
The technology can run queries over trillion-row data tables in seconds and scales to thousands of CPUs and petabytes of data, and supports a SQL-query like language makes it easy for users to interact with data and to formulate ad hoc queries, Google says.
BI and analytics
- Brewer taps Bud Lab at University of Illinois
- Splunk woos Hadoop users
- RSA brings big data analytics to security threat management
- Moving beyond Hadoop for big data needs
- Q&A: What's needed to get a big data job?
- SAS extends analytics support for unstructured data
- Time has come for chief analytics officers
- Big data brings big academic opportunities
- Finding the business value in big data is a big problem
- IT-centric enterprise BI models unsustainable, says Forrester
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Is Your Big Data Solution Production-Ready? Read "Is Your Big Data Solution Production-Ready?" now, and discover best practices and actionable steps to implementing a production-ready big data solution.
- Pay-as-you-Grow Data Protection: IBM Tivoli's Full-featured Data Protection Suite for Small to Medium Businesses IBM Tivoli Storage Manager Suite for Unified Recovery gives small and medium businesses the opportunity to start out with only the individual solutions...
- Simplify and Consolidate Data Protection for Better Business Results Learn about IBM® Tivoli® Storage Manager Operations Center, which provides advanced visualization, built-in analytics and integrated workflow automation features that leapfrog traditional backup...
- Smarter Environmental Analytics Solutions: Offshore Oil and Gas Installations Example This IBM Redbooks® Solution Guide describes a solution for implementing smarter environmental monitoring and analytics for oil and gas industries. The solution implements...
- Webinar: Building a Big Data solution that's production-ready Big data solutions are no longer just a nice-to-have.
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well... All Big Data White Papers | Webcasts