Inside Sainsbury's wildly ambitious data democratisation plans

© Sainsbury's

The UK supermarket giant Sainsbury's is embarking on an enormous cloud data migration effort in order to make analytics more affordable, agile and accessible to employees and to bring customers more seamless and personalised shopping experiences.

Speaking at the Snowflake Summit in London last week, Helen Hunter, chief data and analytics officer for Sainsbury’s, referenced Homer, Aristotle, the Book of Job and the German naturalist Johann Andreas Naumann, as she outlined a boiled-down version of what is a wildly ambitious data migration plan across the retail group over the next four years.

The vision

The first step for the retailer has been a centralisation of IT across its brands, which includes not just Sainsbury's but also Argos, Tu, Habitat and Sainsbury's Bank.

Under new group CIO Phil Jordan – previously of Telefonica – Sainsbury's clearly stated intent is to "know our customers better than anyone else, so as to better anticipate and meet their needs". For Hunter, this meant consolidating, cleaning and reorganising all of its data assets into a cloud-based environment.

"We have for a number of years had IT departments working for our individual brands. We now realise we need to go further, operating together as a single entity if we are really going to rethink how we accelerate our processes, pool our data, and enhance the customer experience," Hunt explained.

Hunt's ambitious goal is to make "access to data as ubiquitous and easy as breathing oxygen" for her colleagues across the Sainsbury's group. "When we think about insight," she said, "it means empowering our colleagues to be able to satiate their curiosity about the way our business works, about how our customers think and feel and behave, with no barriers to accessing and satiating that curiosity."

First, the business identified three groups of end users: data scientists and machine learning engineers; professional analysts and 'citizen analysts' – who want to know more about their customers, but lack the technical skills to do so themselves.

The resulting architecture is internally branded as ASPIRE – a strategic platform for insight and reporting – of which we have a top-level architectural diagram below.

screenshot 2019 10 08 at 16.45.32 © Sainsbury's Tech

Hunter was speaking at a Snowflake event because the purely cloud-based data warehouse technology is a core component of ASPIRE. As shown above, Snowflake is the key middle layer between the ingest systems and the front-end analytics dashboards across the business.

"Snowflake is helping us conquer the hard yards of what it means to rebuild a data ecosystem in the cloud and that gets right at the heart of our business strategy of knowing our customers better than anyone else," Hunter said.

Sainsbury's is currently in the process of ensuring that all of its transactional systems publish data to the ASPIRE ecosystem. As Hunt emphasised, this is a lot of data.

"In Nectar we have the UK's largest loyalty scheme with over 19 million members," she said. "We have the second-largest general merchandise and clothing business in the UK, we have a bank, we have hundreds of stores, thousands of colleagues, thousands of SKUs [stock keeping units], millions of customers, billions of transactions.

"That is one of the most exciting things about analytics at Sainsbury's, this huge data set that you get to experiment with and be curious about. I believe that we probably have one of the preeminent datasets in UK."

The issue, as it is for many organisations like Sainsbury's, has been democratising that data across the business.

"So my job, if you want to know what a chief data and analytics officer does, is to start organising and capturing all of this data, and then turning it into meaningful information that enables our colleagues to think differently about the operations of our business, ultimately, with the ambition of differentiating our offer and making it compelling for our customers," she explained.

The migration

Starting in the spring, Hunter says the organisation will start to migrate its legacy data assets to ASPIRE. As shown below this includes the migration of three large enterprise data stores this year: OI (supply chain analytics for the food business), Pro 4 (analytics for the Nectar loyalty scheme), and EDW – the main data warehouse for the food business, which Hunter described as "the heart and lungs of the organisation".

screenshot 2019 10 08 at 16.45.42 © Sainsbury's Tech

"This is a pictorial representation of what our legacy warehouse and database estate looks like," she said. "We have something like 20 chunky ones across our group of businesses and brands. This year, we are cracking into three of them."

The approach Sainsbury's will take involves running a series of experiments before the actual migrations take place, but "by philosophy, in simple terms, what we are doing is we are cleaning out the attic before, rather than after we move house".

What does that mean? Put another way: "Whilst we've been laying down all these capabilities that need to very elegantly interface together, as you saw in that horribly oversimplified architectural diagram, we have also been testing, improving and learning."

For example, early on with this project Hunter and her team created a dashboard in Snowflake that would stream interaction data from its various digital properties to the digital trading teams in close to real time.

"The business customer got value, could see why we were investing in the Snowflake product and the creation of this ecosystem. At the same time, we were learning a lot about how we needed to fit all the pieces of our data ecosystem jigsaw together," Hunter explained.

Another thing these experiments threw up was a range of inefficiencies that could be purged during migration. Hunter's team found that the majority of code for key extract, transform, load (ETL) jobs in that warehouse was obsolete, as were 40 percent of the tables. Finally, 500 of the reports off the EDW warehouse had not been run in the previous six months.

"That was incredibly useful. We have been on a programme of winnowing all of that out, because clearly we do not want to migrate obsolescence," she added.

In short, Hunter and her team are constantly refining the migration process as they embark on this four-year migration journey. "Every time we do this, we experiment with curiosity, we learn and refine our pattern," she said. "This is indicative of what we've been doing: a phase of designing, a phase of proving it and then [user acceptance testing] and parallel running, moving into production, cut over."

Once that data has been migrated the flow through ASPIRE will look something like this: transactional systems publish data to the ecosystem, it is tagged, it lands in the data lake, data flows through a 'curation layer' in Snowflake before reaching a 'presentation layer', also in Snowflake.

This allows the analytics function to successfully supply information to each of those three business user groups. "The scientists and machine learning engineers want to be able to access the granular raw data straight from the lake," Hunter explained. "Whereas if I am a store manager, I want my data tidy, presented in a self-service dashboard. Therefore it's important that we plug in our visualisation tools, of which we have at least three across the group, with MicroStrategy being our enterprise tool of choice, into the Snowflake presentation layer."

Then, in terms of more ad-hoc consumption, product owners are responsible for grooming a backlog of queries which are picked up by analyst 'tribes' to deliver according to a ticketing system. 

By shifting to a cloud-based system Hunter believes that "the days of operating with and to the average have long gone" and that Sainsbury's can start to focus on "looking to the future and the necessity of predicting behaviour at the most granular level".

"We've really started to think about the capability we will need to make scientific, systemic, automated, data-driven decisions about some of the most complex problems in our organisation," she said.

The new data vision in action

For example, Sainsbury's collects a vast amount of qualitative feedback data from customers through its 'Lettuce Know' programme.

"We're speaking to customers up and down the country and asking them qualitative questions about their experience with us," Hunt said, "customers are doing this in the tens of thousands every month.

"The challenge with that is that finding colleagues with the time and capacity to interrogate that volume of free form text is nearly impossible. Yet, this is where there's real richness, the secret sauce of how we might really differentiate our proposition, to make it more compelling than anybody else is."

In order to start mining this data Sainsbury's is placing it all into a data lake. There they clean it, spell check it and tag it, before deploying statistical topic detection algorithms "to observe words or phrases that frequently occur in conjunction with other words and phrases," Hunter explained. "Then we trend that over time by store and region."

What next?

Hunter talks about creating seamless "multi-brand, multi-channel journeys" for customers across the group. This is something chief digital officer at Sainsbury's, Clodagh Moriarty, told our sister publication CIO earlier this month.

Moriarty uses the example of a barbecue to explain how this could work. If a customer adds burgers from Sainsbury's to their basket, they might then be offered tongs from Habitat, a barbecue from Argos and an umbrella from Tu in case the British weather lets them down. In theory, the company's fast fulfilment options would allow it to deliver all these products to any customer that same day.

"Our customer outcomes are focused on delivering brilliant, integrated, personalised experiences, truly seamless experiences across our brands," Hunter said. "That for us means some hard yards in technology. It means reengineering our systems, our processes and our data so that we can show up brilliantly whenever, wherever and however the customer chooses to shop with us across those multiple brands."

Hunter departed with one final piece of advice for anyone staring down the barrel of a data migration of this magnitude: "This stuff is daunting, right? It is hard. It's often boring, in the eyes of our business stakeholders. So why are we bothering?

"Firstly, scaling and extensibility. The fact that we can lease, use and then give back storage and computation means that we can really effectively manage the cost of curiosity. Secondly, because we have the separation of storage and compute, it means that computation and physical ability to store data is no longer a blocker to analytical projects, which is big, big news for the analyst community. Thirdly, resilience is built into the product by design. It's a completely new way of thinking about resilience once you're in the cloud, and that has significant advantages versus on-prem.

"As we look forward to the future, we look forward to a world in which there are no barriers to our colleagues curiosity about our multi-brand, multi-channel business."

Copyright © 2019 IDG Communications, Inc.

Shop Tech Products at Amazon