12 Hadoop case studies in the enterprise

Back in 2015, analyst house Forrester predicted that enterprise adoption of Hadoop is "mandatory", so any business that wants to derive value from its data should, at the very least, be looking at the technology.

So, what is Hadoop? The open-source Apache Software Foundation describes Hadoop as "a distributed computing platform" or "a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models."

According to the foundation: "Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures."

The advantages - speed, reliability, lower costs - are appealing to the enterprise, and businesses are starting to deploy the technology at various scales.

Here is a selection of case studies from businesses deploying Hadoop at the enterprise scale, from telcos and big banks, to airlines and retailers.

BT
iStock

BT

BT uses a Cloudera enterprise data hubpowered by Apache Hadoop to cut down on engineer call-outs.

By analysing the characteristics of its network, BT can identity whether slow internet speeds are caused by a network or customer issue. They can then evaluate whether an engineer would be likely to repair the problem.

The Cloudera hub provides a unified view of customer data stored in a Hadoop environment. BT earned a return on investment of between 200 and 250 percent within one year of the deployment.

BT has also used it to create new services such as "View My Engineer", an SMS and email alerting system that lets customers track the location of engineers. The company now wants to use predictive analytics to improve vehicle maintenance.

Royal Bank of Scotland
iStock

Royal Bank of Scotland

The Royal Bank of Scotland (RBS) has been working with Silicon Valley company Trifacta to get its Hadoop data lake in order, so it can gain insight from the chat conversations its customers are having with the bank online.

RBS stores approximately 250,000 chat logs plus associated metadata per month. The bank stores this unstructured data in Hadoop. However, before turning to Trifacta this was a huge and untapped source of information about its user base

CERN
iStock

CERN

The Large Hadron Collider in Switzerland is one of the largest and most powerful machines in the world. It is equipped with around 150 million sensors, producing a petabyte of data every second, and the data being delivered is growing all the time.

CERN researcher Manuel Martin Marquez said: “This data has been scaling in terms of amount and complexity, and the role we have is to serve to these scaleable requirements, so we run a Hadoop cluster.”

“From a simplistic manner we run particles through machines and make them collide, and then we store and analyse that data.”

“By using Hadoop we limit the cost in hardware and complexity in maintenance.”

Royal Mail
iStock

Royal Mail

British postal service company Royal Mail has used Hadoop to get the "building blocks in place" for its big data strategy.

Director of the Technology Data Group at Royal Mail, Thomas Lee-Warren, told Computerworld UK that its Hadoop investment is the foundation of a drive to gain more value from internal data. "We have a lot of data,” Lee-Warren explained. “We are about to go up to running in the region of a hundred terabytes, across nine nodes.”

The business uses Hortonworks' Hadoop analytics tools to transform the way it manages data across the organisation, freeing the analytics team to deliver insights on proprietary information held in its data warehouse.

British Airways
iStock

British Airways

British Airways deployed its first instance of Hadoop in April 2015, as a data archive for legal cases that were primarily stored, at a high cost, on its enterprise data warehouse (EDW) platform.

Since deploying Hortonworks 2.2 HDP, Spanos said his department has returned on its investment within a year, and is able to deliver 75 percent more free space for new projects, which translates to cost reductions to the airline’s finance team.

British Airways’ data exploitation manager Alan Spanos said: “In business intelligence, if you don’t adopt this technology to do at least part of your job role, you will not exist in a few years' time. You can only go so far with traditional technology. It still has a place within your architecture, but quite frankly, this is where you need to be.”

Western Union
iStock

Western Union

Global payments provider Western Union implemented a Hadoop-based data analytics platform from Cloudera in 2014 to provide a more personalised experience for its customers.

Using Cloudera Enterprise, Western Union is able to more efficiently store and process real-time analytics on what the vendor describes as “one of the world’s largest enterprise data sets”.

Cloudera’s Apache Hadoop implementation helps Western Union centralise its global customer data in an enterprise data hub, and supports pattern recognition and predictive modelling. The big data analytics platform is aimed at creating a more personalised experience across multiple products and service delivery channels for Western Union customers.

King.com
iStock

King.com

European gaming giant and creator of Candy Crush King.com deployed Cloudera’s Distribution for Apache Hadoop in 2012. The aim was to run analytics for every ‘event’, or action, its millions of users take during gameplay.

The company’s director of data warehousing, Mats-Mats Eriksson, told Computerworld UK that using analytics is vital to its success online.

“Analytics is one of the things that made King.com the thing that it is today,” Eriksson explained. “In the universe that we operate in, gaming online, it is absolutely essential to know as much as possible about the players and optimise everything.”

“Everybody wants a business case for Hadoop, but for me it is simply about difference between knowing what happens in a game and not knowing."

Yahoo
iStock

Yahoo

Yahoo started using data analytics company Splunk’s Hadoop tool, Hunk, in 2015.By analysing its IT operations in real time, the firm has saved millions in hardware costs in a year.

Yahoo analyses 150 terabytes of machine data with Splunk Enterprise every day. This information is used to optimise IT operations, applications delivery and security, as well as business analytics to better understand the customer and personalise search results.

Hundreds of employees are now using Hunk to analyse and visualise 600 petabytes of data, to cost-effectively monitor its infrastructure, as well as Splunk Enterprise.

Expedia
iStock

Expedia

Expedia planned to double its Hadoop investment back in 2015and was an early adopter of Hortoworks project Apache Falcon to crunch large volumes of numbers.

Expedia previously used a DB2 database in conjunction with various instances of Microsoft SQL server, which became increasingly expensive to scale as data volume increased with the business growing organically, along with acquiring several travel companies including Trivago and Hotels.com.

Since moving to Hadoop, the firm has seen costs drop and is able to both store and process data using the cluster.

Woodhead, who is data platform technical lead for Hotels.com, revealed that “hundreds” of employees across different departments and offices, one of which is based in London, used the two-petabyte cluster for web traffic, bookings and travel reviews.

Hotels.com
iStock

Hotels.com

Hotels.com uses Hadoop for huge data storage and offline analytics- that means crunching large amounts of data and not expecting an answer within a millisecond. Cassandra, on the other hand, is used in the online transactional world “where you need an answer below ten milliseconds”.

It can also store the data, but is targeted at online for its speedy capabilities. The business moved from traditional relational databases like Microsoft SQL server three years ago to become “active/active”.

Chief technology officer at Hotels.com, Thierry Bedos, said: "We started solving a real issue for the business - which was customer service and personalising what we offer them online - whereas some firms use big data as an innovation project and say 'we need to play with big data, let's think of some cool use cases we think will add value'”.

Marks and Spencer
iStock

Marks and Spencer

Retail giant M&S adopted the Cloudera Enterprise Data Hub Edition system in 2015 to analyse data from multiple sources, to better understand customer behaviour.

Jagpal Jheeta, head of business information and customer insight at M&S, said: “Smart and efficient data usage is a key focus at M&S, as it ultimately fuels better customer insight, engagement and loyalty. We needed a scalable, robust and future-proof strategic partner. Cloudera is aiding us in leveraging analytics to better serve the business now and in the future.”

Tesla
iStock

Tesla

Tesla is using a Hadoop clusterto collect the increasing amount of data being generated by its connected cars.

CIO Jay Vijayan said: “We are working on a big data platform... The car is connected, but it does not really talk to the network every minute because we want to keep it as smart and efficient as possible. It alerts us if the car is not functioning properly so service teams can take action.”

Copyright © 2017 IDG Communications, Inc.