Does Big Data Need Citizen Data Scientists?

As big data rounds the corner of relevance, CIOs are seeing the value of “citizen developers” since recruiting large numbers of data scientists is proving difficult. Self-service becomes an interesting response, but how can CIOs deliver this?


In my discussions with CIOs in the #CIOChat, it looks like big data is moving out of something CIOs were comfortable experimenting with and into something where CIOs expect concrete business returns. As a result, CIOs are starting to worry about the impact of big data on organizational and cultural design. At the same time, CIOs say they are a bit less sanguine about big data. One CIO summarized it this way, “We realize now that every big data initiative isn’t going to save the planet.” In fact, some CIOs are focused on deriving concrete customer benefits from big data around things like efficiency, speed, and resiliency.

CIOs are also talking the need for “citizen developers,” a term coined by a leading analyst firm (of course). According to CIO Isaac Sacolick, “I have been a strong proponent of citizen development. Self Service BI is a form of citizen development and a strong practice is a key ingredient to becoming a data-driven organization.”  

I’d like to suggest that we extend this concept to the “Citizen Data Scientist.” Dr. Kirk Borne, principal data scientist, Booz Allen Hamilton, is espousing this notion as well. CIOs say they like this idea because recruiting large numbers of data scientists is proving difficult. Given this, self-service becomes an interesting response. The question is, how do CIOs deliver this?

The big data user is different anyhow

The notion of “Citizen Data Scientist” works because of the difference between users for big data and historical business intelligence. According to a major company Chief Data Officer (CDO) who spoke recently on big data, three personas made use of traditional business intelligence and data warehousing:

  • Farmers - They are predictable and they know exactly what they want.
  • Tourists - They know where to find things.
  • Operators - They live in the body of the business. They need some repeatable facts with a fast response time.

These personas demand a data warehouse that provides the same things over and over again. Two personas were left out of traditional data warehousing, however, according to our CDO, and these are the real drivers for big data.

  • Explorers - They want the raw data and will figure out if there is anything of value in it.
  • Miners - They need data in depth. They need every piece of data in order to figure out how to run things better.

Big data addresses the needs of explorers and miners, which is why these two populations have been so excited. For explorers and miners, big data platforms provide an opportunity to get fit-for-purpose assets and be directly involved in the preparation process themselves. These personas do not require data in a fully curated state. In the case of miners, this is because they are looking for new patterns and signals that have yet to be hypothesized. They want to tell a story with data.

So why haven’t organizations embraced more explorers and miners? In a world of growing unauthorized proliferation of data and increasing regulatory requirements, organizations struggle with enabling autonomy for these new data citizens, while also ensuring security and governance. Organizations move between the extremes of data anarchy and data tyranny in trying to understand how much freedom to give this new, emerging group of data citizens. But the struggle between data anarchy and data tyranny is a false one. Surprisingly, technologies that can intelligently understand data can simultaneously enable greater autonomy and control, and serve as the foundation for data democracy.

 How do we bring together traditional and new data users?

From my vantage point, I want to suggest that big data represents an opportunity to create a connected enterprise. This is a place where explorers and miners can discover new facts with potential value for the rest of the enterprise. The goal, then, is to append  existing business intelligence practices while enabling explorers and miners to safely work together as a community.

Big data can extend the entire business intelligence investment equation. We can now put all potentially relevant data into a data lake before making a significant investment in operationalizing the data. Here, citizen data scientists, miners, and explorers can discover things that create new business value. This change enables these end users to safely manage the process of creating business intelligence together.

At the same time, instead of building analytics based only on a priori premises, the data can speak to the citizen data scientist directly via self-service (guided by machine intelligence about the data’s potential meaning, of course). This guidance allows the citizen data scientist to effectively determine what data is of value to the business. The opportunity is clear: quickly ingest the overarching data set into a data lake before making a significant investment in curating the data. Once the citizen data scientist and citizen developers have completed their efforts, they can easily extend that work for building trustworthy data in either a Hadoop cluster or in a traditional data warehouse that the entire business ecosystem needs.

Putting it all together

With miners and explorers, we can enable safe collaboration at the front end of the business intelligence process, and use their collective efforts to lower costs while increasing the business relevance of anything they measure. This is a big opportunity. It is something organizations seeking a business advantage should use to achieve a business advantage.

For further reading, please check out:

Informatica Big Data Edition

3 ways to propel past the coming big data trough of disillusionment

How to take charge of your big data opportunity in manufacturing

The shocking truth about big data

What is big data and why should your business care?

Should we still be calling it big data?

Naked Marketing: A Big Data Marketing Operations Odyssey

The march toward exascale computers
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies