Expedia to double its Apache Hadoop cluster investment this year

Expedia plans to "double the size" of its Hadoop cluster in 2015 to help solve its big data challenges, data lead Adrian Woodhead revealed during the UK Hadoop User’s Group event yesterday.

The annoucements represents the level of investment firms are willing to put into their big data strategies, particularly amongst online-only firms.

Expedia previously used a DB2 database in conjunction with various instances of Microsoft SQL server, which became increasingly expensive to scale as data volume increased with the business growing organically and acquiring several travel companies including Trivago and Hotels.com. Since moving to Hadoop, the firm has seen costs drop and is able to both store and process data using the cluster, Woodhead added.

Woodhead, who is data platform technical lead for Hotels.com, revealed that “hundreds” of employees across different departments and offices, one of which is based in London, used the two petabyte cluster to for web traffic, bookings and travel reviews.

Apache Falcon

During the event, held at the Expedia offices in North London, data warehouse engineer James Grant also explained how the firm is using Apache Falcon, a data management and processing tool that left the Apache incubator just days ago.

Having used Falcon since November, Grant's team can schedule data crunching tasks more effectively so they are ready for the business on time.

Grant said that since using Falcon “everything is more contained”, making it easier to merge data from sources, plan workflows and hit targets. His team previously depended on Apache's Oozie tool.

One instance where Expedia uses Falcon is when quantifying its marketing spend. Data is merged from web bookings, marketing departments and marketing spend logs to analyse whether the outlay has equated to increased bookings.

Additionally, Falcon has become crucial for Search Engine Marketing (SEM) analysis. SEM is when firms bid for key words so they are placed at top of the page on search engines. Often the first listing in Google will be a keyword that has been ‘bought’ - and when it is clicked on the company will be invoiced. The data warehousing team at Expedia merges data on these clicks alongside details of the clicker and whether they went on to book a holiday.

This means that when it comes to bidding for keywords in auctions amongst its competitors, Expedia knows which words are the most valuable – crucial for an online business.

Expedia did not say how much its clusters would cost, however in 2013 the company told ComputerworldUK that the business spent over $500 million (£309 million) a year on its in-house technology.

Electric carmaker Tesla will also use Hadoop to collect more information from its connected cars, its CIO revealed during the Gartner Symposium in Barcelona last year.

Image credit: Flickr/Cintia Regina

Copyright © 2015 IDG Communications, Inc.

Shop Tech Products at Amazon