Best data science platforms

As businesses aim to operationalise their data faster than ever, platforms that allow data scientists to build and test out algorithms are increasingly important.

However, for all enterprises it's important to remember that all data science platforms are relatively immature and may suffer teething problems.

"Data science is not plug and play," Matt Jones, lead analytics strategist at Tessella Analytics told Computerworld UK. "Platforms are fine, but they need to be trained by someone who understands the data and the context it exists in. If you're outsourcing data science to a tech vendor, be absolutely sure they understand your business and your data."

Keeping that in mind, here are some of the best data science platforms, from established vendors to completely open source, being used by enterprises today.

See also: How to get a job as a data scientist: What qualifications and skills you need and what employers expect

H2O.ai
iStock

H2O.ai

H2O.aiis an open source machine learning platform that helps enterprises apply fast and scalable predictive analytics to business problems.

The platform is gaining a growing reputation and was named a Leader in Gartner's 2018 Magic Quadrant for Data Science and Machine Learning Platforms after being chosen as a Visionary in the prior edition.

Gartner praised for its technical capabilities in deep learning, machine learning automation, hybrid cloud support and open source integration and its strong support for customers, which include eBay, Capital One and Comcast.

The code-centric toolchain provides great flexibility and scalability, but doesn't make for the most user-friendly product.

Mathworks\' Matlab for Artificial Intelligence

Mathworks' Matlab for Artificial Intelligence

With Mathworks' range of products, your company can carry out a range of data science activities. Matlab provides a desktop environment developed for iterative analysis and design processes, paired with a programming language that expresses matrix and array mathematics.

The range of tools are all professionally built and rigorously tested, as well as available through interactive apps. It can be used for big data, and machine learning.

Simulink, another product from the brand, allows you to design and simulate your system before moving on to hardware. You can test out and implement a range of designs, all without having to write C, C++, or HDL code.

Alteryx

Alteryx

'Hello, total analytics badassery.' This is what Alteryx says it can help your company achieve. It's a self-service data analytics platform that allows companies to prep, blend, enrich and analyse data as well as deploy predictive analytics.

It's designed to be used by data scientists and analysts, to draw out ever faster insights from the data. However, it has been developed to allow users of varying levels of expertise be able to craft statistical and prescriptive models, as well as integrating third party data sets.

It also makes your data more discoverable and searchable, as well as providing the heavy duty security that is a must.

The Alteryx Data Preparation Tool is currently rated 4.5 on Gartner reviews.

Microsoft Azure machine learning
iStock

Microsoft Azure machine learning

Microsoft provides data scientists with a fully managed cloud service for building and deploying predictive analytics into live environments with its Azure Machine Learning platform. The platform comes with built-in packages to support custom code in your preferred language, be it Python or R, and a plethora of documentation for data scientists to get started.

The Azure platform allows data scientists to deploy models into production quickly as a web service and then share them on the Azure marketplace to gain exposure. Customers include Carnival Cruises, JLL and Fujitsu.

Domino Data Lab
iStock

Domino Data Lab

California-based startup Domino Data Lab's platform is another 'workbench' solution, allowing data science teams to do modelling on their preferred data sources, using whatever tools and programming languages they are comfortable with and to collaborate and deploy models straight from Domino as APIs.

It then acts as a hub for all data science activity, elastically provisioning compute in the cloud and deploying in a consistent, secure manner so that IT can take a back seat. Data science teams at insurers Zurich and Allstate are both customers of Domino.

Cloudera Data Science Workbench
iStock

Cloudera Data Science Workbench

Analytics vendor Cloudera launched its "Data Science Workbench" in March 2017 following the acquisition of Sense.io a year ago. The workbench is intended to be a platform where data science teams can work with their data in popular programming languages like R, Python and Spark in a secured-by-default, collaborative environment.

The idea is to make the modelling and deployment of machine learning and advanced analytics within the enterprise at far greater speeds than if they had to worry about anything other than the actual data science.

SAS Viya
iStock

SAS Viya

Analytics and BI vendor SAS provides data science and machine learning capabilities through its Viya platform.

This is an example of an analytics vendor providing customers with a platform where they can take their advanced analytics work out of self-contained clusters and into an environment where they can be deployed in a secure, consistent way.

"We try to enable people to use what they want to use, but not reinvent the wheel every time," Peter Pugh-Jones, head of technology at SAS UK and Ireland told Computerworld UK.

Dataiku
iStock

Dataiku

The French startup Dataiku provides a host of guided data science and machine learning processes on its platform DSS. The platform has a level of abstraction so that anyone using it can either code in Python, Pig, R, Hive etc. or use drag and drop functionality to wrangle and model data.

The platform allows teams of data scientists, data analysts, and engineers to prototype, build and deliver data solutions into the businesses from a single place. Previous customers include L'Oreal, Trainline and AXA insurance.

In its more recent releases Dataiku has added point-and-click capabilities (called 'visual recipes') for data preparation, the ability to monitor model performance during training, and support for Python 3 with a new code editor.

IBM Data Science Experience
iStock

IBM Data Science Experience

IBM offers a range of data science tools and is preparing to release an IBM Watson-guided machine learning platform.

The current iteration comes with built in learning, so that data scientists can improve the more they engage with the platform, collaboration features and notebook tools for working with popular programming languages, like Jupiter Notebooks for Python and RStudio for R. The enterprise version of the platform retails at $9,200 per instance per month and provides managed Spark clusters and flexible storage.

RapidMiner
iStock

RapidMiner

Open source data science platform RapidMiner helps the likes of BMW, Samsung, Dominos and Barclays launch data science projects.

Tools on the RapidMiner platform include Studio, for visual data science workflows, Server for operationalising models, and Radoop for workflows using Hadoop data.

For larger customers or projects there are enterprise versions of the platform which range from $2,500 to $10,000 a year depending on the rows of data.

Knime
iStock

Knime

The open source and free Knime Analytics Platform looks to give data scientists a blank canvas to work on projects using various data sources and the tools they are comfortable with in a scalable environment.

The open platform comes with thousands of native nodes and modules, extensive documentation and pre-packaged advanced algorithms to get started quickly. Data scientists can toggle quickly between single computer, streaming or big data on top of or alongside existing infrastructure and makes sure that everything is backwards compatible and easily portable for flexibility.

Splunk Machine Learning Toolkit
iStock

Splunk Machine Learning Toolkit

Big data specialist Splunk has moved into more integrated machine learning within its platform over the past year or so, but the vendor also provides a Machine Learning Toolkit for custom models.

The advantage of using Splunk over other workbench solutions is that you can model straight on top of machine-generated data - Splunk's area of expertise - so security and IoT use cases are a natural fit.

The Toolkit is a guided workbench for data scientists to model and deploy algorithms in the most popular programming languages. There is also a library of pre-built Python algorithms for popular use cases, and plenty of documentation and tutorials to get started straight away.

Copyright © 2019 IDG Communications, Inc.