Hadoop gets native R tools for big data analysis
Revolution R Enterprise releases plug-in for running R analytics on Hadoop data sets
IDG News Service - Sensing a growing interest in big data-style analysis, software provider Revolution Analytics has updated its flagship package of R statistical functions so it can be run with the Hadoop data processing platform.
Revolution R Enterprise 7 (RRE 7), to be made available on Monday, also features the ability to run R within Teradata databases as well.
The R language provides a way to run common statistical tests -- such as linear and nonlinear modelling, time-series analysis, classification, and clustering -- on a set of data, often portraying the results in graphical form.
R is becoming increasingly popular for sophisticated data analysis that goes beyond what can be offered by more standard business intelligence (BI) packages. Revolution Analytics has estimated that over 2 million people use R worldwide.
RRE7 includes a library of R algorithms that can be run in parallel across multiple nodes, which is how Hadoop manages large data sets. RRE 7 can be added to the Cloudera CDH3 and CDH4 Hadoop distributions as well as Hortonworks Data Platform 1.3.
The new R library includes the most commonly used statistical and predictive analytics algorithms for tasks such as data processing, data sampling, descriptive statistics, statistical tests, data visualization, simulation, machine learning and predictive models.
By analyzing the data within the node in which it resides, rather than moving it somewhere else to be analyzed, R-based data analysis can done more quickly, according to Revolution Analytics. It also allows an entire set of data to be analyzed, rather than a subset or summary of the data, which is the approach typically taken with enterprise data warehouses (EDWs).
Revolution Analytics hopes the incorporation of R within Hadoop and the Teradata databases will also broaden the use of the language to line-of-business managers. The company has designed a new workflow interface that does not require knowledge of how to implement specific R algorithms. This eliminates the hassle of coding R with Java, or some other language, in order to have it run on the Hadoop platform.
In addition to supporting these new platforms, RRE7 also features a number of new algorithms and processes. One is a collection of models for setting up Decision Forests, a machine learning technique for predicting future outcomes. A new batch of Stepwise Regression functionalities can help automate the process of selecting the most important variables to be used in a predictive model. A new Decision Tree visualization can provide a graphical way for depicting complex relationships and correlations within a set of data.
- The Business Value of Continuous Delivery Download this whitepaper to learn more about the business value of Continuous Delivery and see why it could be a game changer for...
- Coding with JRebel: Java Forever Changed With JRebel, developers get to see their code changes immediately, fine-tune their code with incremental changes, debug, explore and deploy their code with...
- Ten Factors Shaping the Future of Application Delivery Download this research report conducted by Enterprise Management Associates (EMA) to learn how those that are seeking to accelerate application delivery are leveraging...
- Adobe Creative Cloud FAQ The following are answers to common questions about Adobe® Creative Cloud™ for teams membership, purchasing, security, and storage.
- Keep Servers Up and Running and Attackers in the Dark An SSL/TLS handshake requires at least 10 times more processing power on a server than on the client. SSL renegotiation attacks can readily...
- On Demand: Mastering the Art of Mobile Content Management Mobile device usage in the enterprise has skyrocketed, and it continues to escalate. IT must answer to users who demand access to their... All App Development White Papers | Webcasts