Best machine learning tools and frameworks for data scientists and developers

As machine learning becomes more prominent, the number of tools and frameworks available to developers and data scientists have multiplied.

Google, Microsoft, IBM and AWS all offer machine learning APIs via their respective cloud platforms, making it easier for developers to build services by abstracting some of the complexity of their algorithms.

Here are some of the top machine learning tools on the market.

Read next: Big data and business intelligence trends 2017

Google ML Kit
iStock

Google ML Kit

Google ML Kit is Google’s machine learning beta SDK for mobile developers. It allows developers to use machine learning to build features on Android and iOS, whatever the level of expertise.

The ML Kit includes five base APIs that are ready to use across common mobile use cases. These include text recognition, face detection, barcode scanning, image labelling and landmark recognition – all of which are available both online and offline.

Developers are also able to deploy their own TensorFlow Lite models in case the provided APIs do not suit their use case. These can be uploaded directly via the Firebase console, where the SDK platform is based.

OpenNN

OpenNN

OpenNN is a C++ programming library for developers with some experience with machine learning who are looking to implement neural networks.

There is a huge range of materials including tutorials on the site, despite the more experienced audience targeted.

There is a tool for advanced analytics available called Neural Designer, which helps simplify data entries by creating visual content such as graphs and tables.

Download OpenNN here

Apache Mahout

Apache Mahout

Apache Mahout is a distributed linear algebra framework and mathematically expressive Scala DSL that has been developed to allow statisticians or data scientists to implement their own algorithms.

It's one of the projects from the Apache Software Foundation that allows free implementation of scalable machine learning algorithms that primarily cater to areas such as clustering, collaborative filtering and classification.

Many of the implementations rely on the Apache Hadoop platform, so it's worth becoming well versed in how this operates.

You can download Apache Mahout here.

HPE Haven OnDemand
iStock

HPE Haven OnDemand

HPE Haven OnDemandprovides high-level machine learning APIs for enterprise app developers. It has over 70 different APIs available, they range from face detection, image classification, speech and object recognition, text analysis and more.

It is hosted on Microsoft Azure and has a number of API client libraries for developers to apply machine learning to apps easily.

Apache PredictionIO
© Medium

Apache PredictionIO

Apache PredictionIO is an open source machine learning server, built on top of an open source stack for developers and data scientists to create predictive engines for all machine learning tasks.

It can be installed as a full machine learning stack, together with Apache Spark, MLlib, HBase, Spray, and Elasticsearch in order to simplify and accelerate machine learning infrastructure management.

A unique feature of PredictionIO is its ability to respond to dynamic queries in real-time once deployed as a web service, whilst also unifying data from multiple platforms in batch or real-time to gather comprehensive predictive analytics.

PredictionIO also provides a template system for creating machine learning engines. These reduce the traditional form of heavy lifting to set up the system and serve specific kinds of predictions.

Accord.NET
iStock

Accord.NET

Accord.NET is a framework for scientific computing in .NET. It is combined with audio and image processing libraries which encompass a range of scientific computing applications such as machine learning, statistical data processing and pattern recognition.

Additionally, it can be described as a complete framework for building production-grade computer vision, computer audition, signal processing and statistics applications.

Following the merger with the AForge.NET project in 2015, the framework has since offered a unified API for learning and training machine learning models.

Accord.NET can be used on Microsoft Windows, Xamarin, Unity3D, Windows Store applications, Linux and Mobile.

Amazon Machine Learning
iStock

Amazon Machine Learning

Amazon Machine Learning offers a managed service for developers and data scientists building machine learning models and generating predictions.

It enables the development of robust, scalable smart applications that can be used without the need for an extensive background in machine learning algorithms and techniques.

The service consists of three operations that are provided for the machine learning models building process. These are data analysis, model training, and evaluation.

Its features also include APIs for batch and real-time predictions to enable users to easily build smart applications.

Amazon SageMaker

Amazon SageMaker

Many data scientists and developers will already run their training models on Amazon Web Services (AWS) commoditised cloud computing platform.

At AWS re:Invent in November 2017 the vendor launched SageMaker, a fully managed machine learning platform which intends to take away some of the heavy lifting previously involved with running models on AWS.

SageMaker is essentially a platform for authoring, training and deploying machine learning algorithms to business applications without provisioning infrastructure and managing and tuning training models.

Read more: What is AWS SageMaker and can it really democratise machine learning in the enterprise?

Under the covers this means hosted Jupyter notebook integrated development environments (IDEs) for data exploration, cleaning, and preprocessing.

Then there is a distributed model building, training, and validation service where users can pick an AWS algorithm off the shelf, import a popular framework like TensorFlow or write and deploy their own algorithm with Docker containers, directly within SageMaker.

For training, you simply specify a location in S3 and the instance you want to use and in one click SageMaker spins up an isolated cluster and software defined network with autoscaling and data pipelines to start training. When you are done it tears down the cluster.

HTTPs endpoints are used for model hosting, which can scale to support traffic and allow you to A/B test multiple models simultaneously. The algorithms can be deployed straight into production using EC2 instances with one click, after which it will be deployed with autoscaling across availability zones.

Tuning models is traditionally a trial and error exercise but SageMaker comes with what AWS calls 'hyper parameter optimisation (HPO)'. By checking a box SageMaker will spin up multiple copies of the training model and uses machine learning to look at each change in parallel and tune parameters accordingly.

Amazon API services
iStock

Amazon API services

AWS is also building up a stable of machine learning APIs to be consumed off the shelf.

The first three, launched in 2016, were Lex, which is the underlying technology for its Alexa AI voice assistant; Polly for text-to-voice services, and Rekognition for adding image analysis and facial recognition to apps.

Then in 2017 Amazon launched Transcribe for converting speech to text; Amazon Translate for translating text between languages; Amazon Comprehend for understanding natural language; and,Amazon Rekognition Video, a computer vision service for analysing videos in batches and in real-time.

Amazon\'s Deep Scalable Sparse Tensor Network Engine (DSSTNE)
iStock

Amazon's Deep Scalable Sparse Tensor Network Engine (DSSTNE)

The open source deep learning library, pronounced 'destiny', allows data scientists to train and deploy deep neural networks using GPUs.  It can be seen as a response to Google's open sourcing of TensforFlow.

DSSTNE was built by the retail giant's engineers to power its recommendations engine that makes product suggestions to the hundreds of millions of customers on its websites each day.

Amazon said at the time: "We are releasing DSSTNE as open source software so that the promise of deep learning can extend beyond speech and language understanding and object recognition to other areas such as search and recommendations.

"We hope that researchers around the world can collaborate to improve it. But more importantly, we hope that it spurs innovation in many more areas."

Azure Machine Learning workbench

Azure Machine Learning workbench

Microsoft's announced a revamp of its Microsoft Azure machine learning tools during its 'Ignite' conference in September 2017.

Microsoft announced three major machine learning tools, one of which is the Azure Machine Learning workbench, described as a cross-platform client for data and experiment management.

The workbench will support modelling in Python, Scala and PySpark.

Azure Machine Learning Model Management

Azure Machine Learning Model Management

At its Ignite conference in September 2017, Microsoft also announced the release of its Azure Machine Learning Model Management tool.

This aims to help developers 'manage and deploy machine learning workflows and models' while offering these modelling capabilities:

- Model versioning
- Model checking
- Deploying models to production
- Creating Docker containers with the models and testing them locally
- Automated model retraining
- Capturing model telemetry for actionable insights

Google TensorFlow

Google TensorFlow

Google open sourced its TensorFlow software library through an Apache licence in 2015. The machine learning software library is the next generation of DistBelief, which was internally developed by the Google Brain team at the search giant for a multitude of tasks such as image search and improving its speech recognition algorithms.

Read next: What is TensorFlow? How are businesses using it?

TensorFlow can produce C++ or Python graphs that can be processed on CPUs or GPUs. These flow graphs depict the movement of data running through a system. By open sourcing the TensorFlow library of machine learning code, Google is facilitating the simpler construction, training and deployment of complex deep neural nets.

Google APIs
iStock

Google APIs

Google also has a host of machine learning APIs on its Cloud Platform. This includes its popular Prediction API, which allows users to tap the search giant’s algorithms to analyse data and predict future outcomes. Google has added further APIs to allow users to build their own machine learning-based services, including Speech, Translate and Vision.

In March 2017, Google launched a new machine learning API for automatically recognising objects in videos and making them searchable. This API is called Cloud Video Intelligence and is to be used to help developers extract certain objects from videos automatically. The API essentially allows developers to tag images in videos based on searchable terms for example, tree or house. Currently, developers can sign up for its beta version.

Microsoft Distributed Machine Learning Toolkit (DMLT)
iStock

Microsoft Distributed Machine Learning Toolkit (DMLT)

Microsoft's machine learning toolkit - which is available on Github - aims to ease crowded machine learning clusters, making it easier to run multiple (and differing) machine learning applications at the same time.

"Bigger models tend to generate better accuracies in various applications," Microsoft said. "However, it remains a challenge for common machine learning researchers and practitioners to learn big models."

Microsoft Computational Network Toolkit (CNTK)
iStock

Microsoft Computational Network Toolkit (CNTK)

Also from Microsoft, the Computational Network Toolkit enables users to create neural networks depicted in directed graphs. While primarily made for speech recognition technology, since April 2015 it has become a more general machine learning toolkit supporting image, text and RNN training (recurrent neural network - a type of neural network).

IBM Watson Analytics
Image: IBM

IBM Watson Analytics

The Watson Analytics cloud service was unveiled in 2014 as part of IBM’s plans to turn Watson from a part-time game show contestant into a bona fide enterprise software proposition.

It aims to help organisations that have little or no experience of predictive analytics put their business data to good use.

IBM had already launched its Watson Developer Cloud - in 2013 - offering access to APIs via its Bluemix platform as a service cloud, allowing developers to create their own applications based on Watson’s smarts.

BigML

BigML

It is not only big IT firms that are moving into artificial intelligence in the cloud. BigML is one of a number of startups in the market aiming to open artificial intelligence to a wider audience.

Founded in Oregon in 2011, BigML offers a simple user interface, allowing users to upload data sets to start making predictions.

Apache Spark MLlib and Singa

Apache Spark MLlib and Singa

Apache Spark MLlib is an in-memory data processing framework. Spark offers a large and growing library of useful algorithms and utilities incorporating classification, regression, clustering, collaborative filtering and more (for in-memory data processing).

Singais an open source framework within the Apache incubator, providing a programming tool for deep-learning networks across numerous machines.

Veles

Veles

Velesis Samsung's distributed deep learning platform, which is written in C++ and uses Python for coordination between nodes. Veles offers an API enabling immediate use of trained models and can be used for data analysis.

Alibaba’s Aliyun
Image: Alibaba

Alibaba’s Aliyun

In August 2015, Chinese ecommerce giant Alibaba announced that its cloud computing business, Aliyun, would offer a machine learning service to help enterprise customers streamline analytics software development.

The service is based on Aliyun’s Open Data Processing Service (ODPS) technology, which is capable of processing 100 petabytes of data in six hours.

The DT PAI platform offers a drag and drop interface to simplify the process for developers.

"What used to take days can be completed in minutes," said Xiao Wei, senior product expert with Alibaba's cloud business, as the service was announced.

Caffe

Caffe

Caffeis a deep learning C++ framework initially created for machine vision uses (an imaging-based automatic inspection).  It is developed by the Berkeley Vision and Learning Center (BVLC) as well as community developers.

The framework is already used as part of "academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia".

Yahoo recently open sourced CaffeOnSpark, combining deep learning functionality with the Spark data processing engine.

Google and Pintrest have also used Caffe in their operations.

Neon

Neon

Neon is Nervana's open source, Python-based machine learning library.

The deep learning startup, founded in 2014, has also launched a cloud service based on Neon, which it claims is ten times faster than competing services. This means that businesses can build, train and deploy deep-learning technologies much more quickly.

Wise.io

Wise.io

Wise.ioalso aims to democratise the use of artificial intelligence with 'machine learning as a service' that is ready for enterprise use. Founded in 2012, the Californian startup's algorithms were initially developed to help astronomers discover and map new stars, before being put to use by businesses.

Its customers include Volkswagen and Citrix.

Copyright © 2018 IDG Communications, Inc.