Tips for scaling up a data analytics project

Four innovation leaders share their adventures in scaling an analytics infrastructure to back up business-critical decisions with hard data.

big data analytics research chart growth

The U.S. Environmental Protection Agency's new chief data scientist likens the adoption of big data analytics at the agency to the early adoption of the iPhone in 2007. Those early adopters "didn't know exactly what it was, but they wanted to use it because they perceived the value," says Robin Thottungal.

Many innovation leaders feel the same way. IDC predicts that the big data and business analytics market will grow at a 23.1% compound annual growth rate from nearly $122 billion in revenue worldwide last year to $187 billion by 2019.

Most early adopters of big data and analytics tools are likely hoping to help their organizations become insight-driven enterprises. But they will face a number of challenges as they try to realize that goal, such as the difficulty of accessing the necessary data, the need for more powerful computer systems and the task of building enthusiasm among users for a technology whose value proposition has yet to be proved.

Here are some tales from the trenches, plus tips for scaling an analytics infrastructure.

Relying on data, rather than instinct

At VMware, the sales planning teams once used manual processes, spreadsheets and "gut feeling" to set goals for the company's 4,000 global sales reps and 200 sales operations staffers, says Avon Singh Puri, vice president of IT enterprise applications and platforms. The company needed a global market strategy and a sales automation tool that was flexible enough to handle local market nuances.

puri avon 2016

Avon Singh Puri, VMware

So the vendor of cloud and virtualization software and services set out to turbocharge its sales processes with data and an analytics-based system. Puri and his team developed a multidimensional modeling capability, bringing together data from third-party market researchers and CRM, master data management, ERP and enterprise data warehousing systems.

The new platform uses an existing enterprise data warehouse built on Pivotal Software's Greenplum system that performs aggregation and runs fast analytics on huge data volumes. Aggregated data then goes to Anaplan's business modeling and sales planning tool.

Moving the data from warehouse to modeling presented challenges — starting with system performance. "The model was taking hours and hours to run," Puri says. "So we put together a layer in between based on IBM's operational decision management solution."

Now, business users have a way — in between the data warehouse and the Anaplan tool — to manage what conditions should be applied to the data going into the modeling tool and thereby control modeling speed. Business users can also tweak conditions on their own and perform what-if analysis. "We try to enable as much self-service as possible," so IT doesn't have to handle every small change that needs to be made, Puri explains.

Today, VMware's sales team can analyze three years' worth of data to see how groups performed against goals and determine future plans. One model in its in-memory analysis tool contains about 5.5 billion cells of data, Puri says. VMware's sales planning process has been reduced from eight weeks to four, quota accuracy has increased from 65% to 70%, and territory disputes have fallen by 30% as a result of better what-if analysis capabilities.

Puri's advice: Leverage the investments you've already made in a data warehouse. "The synergies we got with existing data and with the architecture we had with the speed and the processing — that base of building the solution was much better than if we would have started it from scratch," he says.

Populating data by crowdsourcing

In 2013, BNY Mellon envisioned an analytics system that would track the journey of a single piece of data from the minute it comes into its organization all the way through its entire life cycle, just as a package delivery company can track the delivery process — from the time a package is picked up to the time it is delivered and every step in between.

Digital Pulse, the big data and analytics component of BNY Mellon's NEXEN Digital Ecosystem, does just that. The platform captures data from all lines of business, stores it in one place, then applies visualization, predictive analytics and machine learning to analyze data. Business units use the results to improve processes and performance, and enhance the customer experience.

"Today, analytics is embedded in our daily work," says Jennifer Cole, BNY Mellon's managing director of client experience delivery. Three control rooms, for example, are equipped with huge monitors that show up-to-the-minute cash balances around the world with every clearing channel where BNY Mellon is a member. "It allows us to see the data in real time as it's created. That isn't something we could do before," Cole says.

Jennifer Cole, managing director of client experience delivery, BNY Mellon [2016] BNY Mellon

Jennifer Cole, BNY Mellon

Capturing disparate types of data from 100 markets in 35 countries was a daunting early task. But team leaders made it less daunting by essentially crowdsourcing the ingestion of the data, Cole says. A small governance team wrote instructions for selecting and preparing data and sent them to every business unit. Users followed the instructions and told the governance team what data they wanted to submit to the system. The team reviewed the requests to ensure that the proposed data was in the right format "and that it was going to add value, and then the payload was automated," she says.

"When you spread that [task] out among 50,000 people globally and 13,000 technologists, that makes it much more doable," Cole says — though the process did take about a year.

Even with early stakeholder input, adoption is still one of the biggest challenges. The BNY Mellon analytics system didn't really take off "until we actually put [visual results] into users' hands," Cole says. Today, 13,000 employees use the platform, along with 3,500 external users. More than 1.4 billion data "events" are stored each month, and 124 applications have been implemented.

Cole's advice: "Do what it takes to shorten the feedback loop. Get visually whatever you can in front of the end user or subject-matter expert so that you can then iterate."

The art of data persuasion

Innovation leaders may face an uphill battle when it comes to persuading some departments to share their data. In some cases, "people don't want to give access to data either because they get some benefit from owning it or they're afraid of being embarrassed by the data," says Tom Soderstrom, IT chief technology and innovation officer at NASA's Jet Propulsion Laboratory in Pasadena, Calif.

Soderstrom persuaded data owners to share by giving them a sneak preview of the expected result through prototypes. For example, he says, "we wanted to figure out where people would end up when one of our big projects would finish. We were looking for HR time sheets, titles and project codes over time. This data was sensitive, and people who owned the data didn't want to give it up."

So Soderstrom went to the CIO, a project proponent, who agreed to share data only about the IT staff that worked on the project. "Now we had a subset of data to analyze and build a sample report," he says. On seeing the report, he adds, the stakeholder said, "'This is exactly what I'm looking for — how can I get this for the rest of JPL?' I replied, 'By giving us access to the data.' Visual analytics was a key to getting started, and then we went on to predictive and prescriptive analytics."

Today, JPL's data holders are eager to share. For example, Soderstrom says that spacecraft groups "now have hands-on access to 30 billion data points that they can scale, [analyze] for patterns, search for messages or compare spacecraft with each other — things that were impossible before."

Soderstrom has this advice for scaling an analytics infrastructure: "First, find the business use case with a passionate developer and business user. Write a one-page proposal of what you hope to get, and then quickly do a prototype — something they can see." Human resources data has the greatest potential to show value quickly, he adds. 
Today JPL embeds data scientists in business units. "They understand the subject matter of their departments and can respond quickly," Soderstrom says, but they're also "loosely federated" and work together.

Building a community of analytics practitioners

The EPA's journey into data analytics started when the agency moved to electronic reporting from manual processes and began using sensors to capture environmental data, such as metrics relating to air, water and soil quality. Thottungal, the agency's first chief data scientist, was hired in September 2015 to create insights from all of the data. In less than eight months, he built a viable analytics platform using an approach common to lean startups — innovating quickly and learning from mistakes.

Robin Thottungal, chief data scientist, EPA [2016] EPA

Robin Thottungal, EPA

Thottungal takes a dual approach to scaling. "I have data scientists looking at all the new platforms and technologies [from Google, Facebook, Twitter and LinkedIn] and asking how can we can leverage [those] to deliver on our mission," he says. "We don't know exactly what the value is, but we know that these technologies can help us do something."

He's also working with about 10 groups inside the EPA that are well-known early technology adopters and who believe analytics will provide value. "They are coming to me and are already willing to give me the data, resources and people, so they can learn," Thottungal says. "I have people from the agency training with my data scientists because they know there is a value to it."

Today, he has built a community of about 200 analytics practitioners who discuss and share their analytics activities weekly. "I want to see that group maturing into an ecosystem within themselves," he says, "so they can help each other."


Copyright © 2016 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon