Moving beyond Hadoop for big data needs
Hadoop isn't enough anymore for enterprises that need new and faster ways to extract business value from massive datasets
Computerworld - Hadoop and MapReduce have long been mainstays of the big data movement, but some companies now need new and faster ways to extract business value from massive -- and constantly growing -- datasets.
While many large organizations are still turning to the open source Hadoop big data framework, its creator, Google, and others have already moved on to newer technologies.
The Apache Hadoop platform is an open source version of the Google File System and Google MapReduce technology. It was developed by the search engine giant to manage and process huge volumes of data on commodity hardware.
It's been a core part of the processing technology used by Google to crawl and index the Web.
Hundreds of enterprises have adopted Hadoop over the past three or so years to manage fast-growing volumes of structured, semi-structured and unstructured data.
The open source technology has proved to be a cheaper option than traditional enterprise data warehousing technologies for applications such as log and event data analysis, security event management, social media analytics and other applications involving petabyte-scale data sets.
Analysts note that some enterprises have started looking beyond Hadoop not because of limitations in the technology, but for the purposes it was designed.
Hadoop is built for handling batch-processing jobs where data is collected and processed in batches. Data in a Hadoop environment is broken up and stored in a cluster of highly distributed commodity servers or nodes.
In order to get a report from the data, users have to first write a job, submit it and wait for it to get distributed to all of the nodes and get processed.
While the Hadoop platform performs well, it's not fast enough for some key applications, says Curt Monash, a database and analytics expert and principal at Monash Research. For instance, Hadoop does not fare well in running interactive, ad hoc queries against large datasets, he said.
"Hadoop has trouble with is interactive responses," Monash said. "If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies."
Companies needing such capabilities are already looking beyond Hadoop for their big data analytics needs.
Google, in fact, started using an internally developed technology called Dremel some five years ago to interactively analyze or "query" massive amounts of log data generated by its thousands of servers around the world.
Google says the Dremel technology supports "interactive analysis of very large datasets over shared clusters of commodity machines."
The technology can run queries over trillion-row data tables in seconds and scales to thousands of CPUs and petabytes of data, and supports a SQL-query like language makes it easy for users to interact with data and to formulate ad hoc queries, Google says.
BI and analytics
- Big data key to bringing hyperlocal weather forecasts to Georgia farmers
- Brewer taps Bud Lab at University of Illinois
- Splunk woos Hadoop users
- RSA brings big data analytics to security threat management
- Moving beyond Hadoop for big data needs
- Q&A: What's needed to get a big data job?
- SAS extends analytics support for unstructured data
- Time has come for chief analytics officers
- Big data brings big academic opportunities
- Finding the business value in big data is a big problem
- Getting Real About Management and "Big Data" It's an exciting yet daunting time to be a security professional. Security threats are becoming more aggressive and voracious. Governments and industry bodies...
- The Big Data Security Analytics Era Is Here Security management must be based upon continuous monitoring and data analysis for situational awareness and data-driven security decisions. Organizations have entered the era...
- Unlocking the Promise of Demand Sensing and Shaping through Big Data Analytics Many organizations have limited insight into big data. These limitations have significant opportunity costs and can have a negative effect on identifying and...
- Managing the Complexity of Today's Hybrid IT Environments There are many factors driving the complexity of IT today, each making it harder for organizations to get their hands around IT. Read...
- Big Data in the Mainstream: Insights for Everyone, Everywhere In this Webcast you will learn why small data is important and how to embed insights into CRM and HRMS applications to reach...
- Charting Your Analytical Future - "Making predictive analytics part of your business processes" Webinar This session will show how predictive analytics can be used throughout the organization by anyone looking for answers and how organizations can make... All Big Data White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!