Review: The best frameworks for machine learning and deep learning

TensorFlow, Spark MLlib, Scikit-learn, MXNet, Microsoft Cognitive Toolkit, and Caffe do the math

frames for sale
Mattes (CC BY-SA 3.0)
At a Glance

Over the past year I've reviewed half a dozen open source machine learning and/or deep learning frameworks: Caffe, Microsoft Cognitive Toolkit (aka CNTK 2), MXNet, Scikit-learn, Spark MLlib, and TensorFlow. If I had cast my net even wider, I might well have covered a few other popular frameworks, including Theano (a 10-year-old Python deep learning and machine learning framework), Keras (a deep learning front end for Theano and TensorFlow), and DeepLearning4j (deep learning software for Java and Scala on Hadoop and Spark). If you’re interested in working with machine learning and neural networks, you’ve never had a richer array of options.  

There's a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and it may or may not include neural network methods. A deep learning or deep neural network (DNN) framework covers a variety of neural network topologies with many hidden layers. These layers comprise a multistep process of pattern recognition. The more layers in the network, the more complex the features that can be extracted for clustering and classification.

Caffe, CNTK, DeepLearning4j, Keras, MXNet, and TensorFlow are deep learning frameworks. Scikit-learn and Spark MLlib are machine learning frameworks. Theano straddles both categories.

In general, deep neural network computations run an order of magnitude faster on a GPU (specifically an Nvidia CUDA general-purpose GPU, for most frameworks), rather than on a CPU. In general, simpler machine learning methods don't need the speedup of a GPU.

While you can train DNNs on one or more CPUs, the training tends to be slow, and by slow I'm not talking about seconds or minutes. The more neurons and layers that need to be trained, and the more data available for training, the longer it takes. When the Google Brain team trained its language translation models for the new version of Google Translate in 2016, they ran their training sessions for a week at a time, on multiple GPUs. Without GPUs, each model training experiment would have taken months.

Each of these packages has at least one distinguishing characteristic. Caffe's strength is convolutional DNNs for image recognition. Cognitive Toolkit has a separate evaluation library for deploying prediction models that works on ASP.Net websites. MXNet has excellent scalability for training on multi-GPU and multimachine configurations. Scikit-learn has a wide selection of robust machine learning methods and is easy to learn and use. Spark MLlib integrates with Hadoop and has excellent scalability for machine learning. TensorFlow has a unique diagnostic facility for its network graphs, TensorBoard.

On the other hand, the training speed of all the deep learning packages on GPUs is nearly identical. That's because the training inner loops spend most of their time in the Nvidia CuDNN package. Still, each package takes a somewhat different approach to describing neural networks, with two major camps: those that use a graph description file, and those that create their descriptions by executing code.

With that in mind, let's dive into each one.


The Caffe deep learning project, originally a strong framework for image classification, seems to be stalling, based on its persistent bugs, as well the fact that it has been stuck at version 1.0 RC3 for more than a year and the founders have left the project. It still has good convolutional networks for image recognition and good support for Nvidia CUDA GPUs, as well as a straightforward network description format. On the other hand, its models often need substantial amounts of GPU memory (more than 1GB) to run, its documentation is spotty and problematic, support is hard to obtain, and installation is iffy, especially for its Python notebook support.

Caffe has command-line, Python, and Matlab interfaces, and it relies on ProtoText files to define its models and solvers. Caffe defines a network layer by layer in its own model schema. The network defines the entire model bottom to top from input data to loss. As data and derivatives flow through the network in the forward and backward passes, Caffe stores, communicates, and manipulates the information as blobs (binary large objects) that internally are N-dimensional arrays stored in a C-contiguous fashion (meaning the rows of the array are stored in contiguous blocks of memory, as in the C language). Blobs are to Caffe as tensors are to TensorFlow.

Layers perform operations on blobs and constitute the components of a Caffe model. Layers convolve filters, perform pooling, take inner products, apply nonlinearities (such as rectified-linear and sigmoid and other element-wise transformations), normalize, load data, and compute losses such as softmax and hinge.

Caffe has proven its effectiveness in image classification, but its moment seems to have passed. Unless an existing Caffe model fits your needs or could be fine-tuned to your purposes, I recommend using TensorFlow, MXNet, or CNTK instead.

caffe net surgery notebook InfoWorld

A precomputed Caffe Jupyter notebook displayed in NBViewer. This notebook explains doing “surgery” on Caffe networks using a cute kitten.

Microsoft Cognitive Toolkit

Microsoft Cognitive Toolkit is a fast and easy-to-use deep learning package, but it is limited in scope compared to TensorFlow. It has a good variety of models and algorithms, excellent support for Python and Jupyter notebooks, an interesting declarative BrainScript neural network configuration language, and automated deployment for Windows and Ubuntu Linux.

On the downside, when I reviewed Beta 1 the documentation had not yet been fully updated to CNTK 2, and the package had no MacOS support. While there have been many improvements to CNTK 2 since Beta 1, including a new memory compression mode to reduce memory usage on GPUs and new Nuget installation packages, MacOS support is still absent.

The Python API added for Beta 1 helps to bring the Cognitive Toolkit to mainstream, Python-writing, deep learning researchers. The API contains abstractions for model definition and compute, learning algorithms, data reading, and distributed training. As a supplement to the Python API, CNTK 2 has new Python examples and tutorials, along with support of Google’s protocol buffers serialization. The tutorials are implemented as Jupyter notebooks.

CNTK 2 components can handle multidimensional dense or sparse data from Python, C++, or BrainScript. The Cognitive Toolkit includes a wide variety of neural network types: FFN (Feedforward), CNN (Convolutional), RNN/LSTM (Recurrent/Long Short Term Memory), batch normalization, and sequence to sequence with attention, for starters. It supports reinforcement learning, generative adversarial networks, supervised and unsupervised learning, automatic hyperparameter tuning, and the ability to add new, user-defined, core components on the GPU from Python. It is able to do parallelism with accuracy on multiple GPUs and machines, and (Microsoft claims) it can fit even the largest models into GPU memory.

The CNTK 2 APIs support defining networks, learners, readers, training, and evaluation from Python, C++, and BrainScript. They also support evaluation with C#. The Python API interoperates with NumPy and includes a high-level layers library that enables concise definition of advanced neural networks, including recurrences. The toolkit supports representation of recurrent models in symbolic form as cycles in the neural network instead of requiring static unrolling of the recurrence steps.

You can train CNTK 2 models on Azure networks and GPUs. The GPU-equipped N-series family of Azure Virtual Machines, which was in limited rollout when I reviewed Beta 1, is now generally available and fully manageable from the Azure console.

cntk 101 InfoWorld

Several CNTK 2/Microsoft Cognitive Toolkit tutorials are supplied as Jupyter notebooks. The figure shows the visualizations plotted for the training of the Logistic Regression tutorial.


MXNet, a portable, scalable deep learning library that is Amazon's DNN framework of choice, combines symbolic declaration of neural network geometries with imperative programming of tensor operations. MXNet scales to multiple GPUs across multiple hosts with a near-linear scaling efficiency of 85 percent and boasts excellent development speed, programmability, and portability. It supports Python, R, Scala, Julia, and C++ to various degrees, and it allows you to mix symbolic and imperative programming flavors.

At the time I reviewed MXNet the documentation felt unfinished, and I found few examples for languages other than Python. Both situations have improved since my review.

The MXNet platform is built on a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly, although you have to tell MXNet which GPU and CPU cores to use. A graph optimization layer on top of the scheduler makes symbolic execution fast and memory efficient.

MXNet currently supports building and training models in Python, R, Scala, Julia, and C++; trained MXNet models can also be used for prediction in Matlab and JavaScript. No matter what language you choose for building your model, MXNet calls an optimized C++ back-end engine.

The MXNet authors consider their API a superset of what's offered in Torch, Theano, Chainer, and Caffe, albeit with more portability and support for GPU clusters. In many respects MXNet is similar to TensorFlow, but with the added ability to embed imperative tensor operations.

In addition to the practically obligatory MNIST digit classification, the MXNet tutorials for computer vision cover image classification and segmentation using convolutional neural networks (CNN), object detection using Faster R-CNN, neural art, and large-scale image classification using a deep CNN and the ImageNet data set. There are additional tutorials for natural language processing, speech recognition, adversarial networks, and both supervised and unsupervised machine learning.

mxnet scaling Amazon

Amazon tested an Inception v3 algorithm implemented in MXNet on P2.16xlarge instances and found a scaling efficiency of 85 percent.


The Scikit-learn Python framework has a wide selection of robust machine learning algorithms, but no deep learning. If you’re a Python fan, Scikit-learn may well be your best option among the plain machine learning libraries.

Scikit-learn is a robust and well-proven machine learning library for Python with a wide assortment of well-established algorithms and integrated graphics. It is relatively easy to install, learn, and use, and it has good examples and tutorials.

On the con side, Scikit-learn does not cover deep learning or reinforcement learning, lacks graphical models and sequence prediction, and can't really be used from languages other than Python. It doesn't support PyPy, the Python just-in-time compiler, or GPUs. That said, except for its minor foray into neural networks, it doesn't really have speed problems. It uses Cython (the Python-to-C compiler) for functions that need to be fast, such as inner loops.

Scikit-learn has a good selection of algorithms for classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It has good documentation and examples for all of these, but lacks any kind of guided workflow for accomplishing these tasks.

Scikit-learn earns top marks for ease of development, mostly because the algorithms all work as advertised and documented, the APIs are consistent and well designed, and there are few "impedance mismatches" between data structures. It's a pleasure to work with a library where the features have been thoroughly fleshed out and the bugs thoroughly flushed out.

scikit learn plotlabel propagation InfoWorld

This example uses Scikit-learn’s small handwritten digit data set to demonstrate semi-supervised learning using a Label Spreading model. Only 30 of the 1,797 total samples were labeled. 

On the other hand, the library does not cover deep learning or reinforcement learning, which leaves out the current hard but important problems, such as accurate image classification and reliable real-time language parsing and translation. Clearly, if you’re interested in deep learning, you should look elsewhere.

Nevertheless, there are many problems, ranging from building a prediction function linking different observations to classifying observations to learning the structure of an unlabeled data set, that lend themselves to plain old machine learning without needing dozens of layers of neurons. For those areas Scikit-learn is very good indeed.

InfoWorld Scorecard
Models and algorithms (25%)
Ease of development (25%)
Documentation (20%)
Performance (20%)
Ease of deployment (10%)
Overall Score (100%)
Caffe 1.0 RC3 8 8 7 9 8 8.0
Microsoft Cognitive Toolkit v2.0 Beta 1 8 9 8 10 9 8.8
MXNet v0.7 8 8 7 10 8 8.2
Scikit-learn 0.18.1 9 9 9 8 9 8.8
Spark MLlib 2.01 9 8 8 9 8 8.5
TensorFlow r0.10 9 8 9 10 8 8.9
At a Glance
1 2 Page 1
Page 1 of 2
Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon