Moving beyond Hadoop for big data needs
Hadoop isn't enough anymore for enterprises that need new and faster ways to extract business value from massive datasets
Computerworld - Hadoop and MapReduce have long been mainstays of the big data movement, but some companies now need new and faster ways to extract business value from massive -- and constantly growing -- datasets.
While many large organizations are still turning to the open source Hadoop big data framework, its creator, Google, and others have already moved on to newer technologies.
The Apache Hadoop platform is an open source version of the Google File System and Google MapReduce technology. It was developed by the search engine giant to manage and process huge volumes of data on commodity hardware.
It's been a core part of the processing technology used by Google to crawl and index the Web.
Hundreds of enterprises have adopted Hadoop over the past three or so years to manage fast-growing volumes of structured, semi-structured and unstructured data.
The open source technology has proved to be a cheaper option than traditional enterprise data warehousing technologies for applications such as log and event data analysis, security event management, social media analytics and other applications involving petabyte-scale data sets.
Analysts note that some enterprises have started looking beyond Hadoop not because of limitations in the technology, but for the purposes it was designed.
Hadoop is built for handling batch-processing jobs where data is collected and processed in batches. Data in a Hadoop environment is broken up and stored in a cluster of highly distributed commodity servers or nodes.
In order to get a report from the data, users have to first write a job, submit it and wait for it to get distributed to all of the nodes and get processed.
While the Hadoop platform performs well, it's not fast enough for some key applications, says Curt Monash, a database and analytics expert and principal at Monash Research. For instance, Hadoop does not fare well in running interactive, ad hoc queries against large datasets, he said.
"Hadoop has trouble with is interactive responses," Monash said. "If you can stand latencies of a few seconds, Hadoop is fine. But Hadoop MapReduce is never going to be useful for sub-second latencies."
Companies needing such capabilities are already looking beyond Hadoop for their big data analytics needs.
Google, in fact, started using an internally developed technology called Dremel some five years ago to interactively analyze or "query" massive amounts of log data generated by its thousands of servers around the world.
Google says the Dremel technology supports "interactive analysis of very large datasets over shared clusters of commodity machines."
The technology can run queries over trillion-row data tables in seconds and scales to thousands of CPUs and petabytes of data, and supports a SQL-query like language makes it easy for users to interact with data and to formulate ad hoc queries, Google says.
BI and analytics
- Big data key to bringing hyperlocal weather forecasts to Georgia farmers
- Brewer taps Bud Lab at University of Illinois
- Splunk woos Hadoop users
- RSA brings big data analytics to security threat management
- Moving beyond Hadoop for big data needs
- Q&A: What's needed to get a big data job?
- SAS extends analytics support for unstructured data
- Time has come for chief analytics officers
- Big data brings big academic opportunities
- Finding the business value in big data is a big problem
Best Practices in Enterprise Data Governance
This paper explores the challenges organizations have today in implementing a data
governance program via an actual business case. It highlights SAS technology that
- 10 Mistakes to Avoid When Launching Your DG Program From failing to define data governance, to premature launch, or expecting too much from a sponsor, this white paper explains ten common mistakes...
- A Non-Geek's Big Data Playbook A visual playbook for the non-geek yet technically savvy business professional who is still trying to understand how big data impacts the enterprise...
- Top 3 Myths about Big Data Security : Debunking common misconceptions about big data security Big data represents massive business possibilities and competitive advantage for organizations that are able to harness and use that information. But how are...
- Charting Your Analytical Future - "Making predictive analytics part of your business processes" Webinar This session will show how predictive analytics can be used throughout the organization by anyone looking for answers and how organizations can make...
- Improved Data-centric Application Development and Hadoop Operations with BMC and Hortonworks Join this webinar to hear from BMC and Hortonworks how their combined solutions help customers unlock the value of Big Data by implementing... All Big Data White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!