It might be argued that access to more data gives people more ability to make the data say what they want, as in the case of data around which there are conflicting interests such as political campaigns and so on. Another argument is that collecting and storing more data, opens up even more opportunities for that information to be used inappropriately or in ways that may impact privacy and security.
However, if you’re looking to make sense of data by uncovering insights that may help to improve business processes or boost competitive advantage, the general rule of thumb for analytics is that “more actually is more”. Having access to more data simply gives you a more complete picture of your environment and whatever it is you want to observe, analyze or even predict. The key question is what types of additional data do you need, and how can you uncover the patterns?
In most cases, I think there are four types of additional data you can tap into. If you think about your current data sources, for example, you can tap into more historical data, more real-time data, and more frequent or higher resolution data (i.e. at a higher sampling rate such as every hour instead of every day for example), so that’s three additional ways you can enhance what you’re currently collecting where it makes sense to do so. The types of data I’m talking about might include time series data such as stock trades, video surveillance, weather data, sales data, feeds from intelligent sensors and so on.
The fourth option is to tap into additional data sources which may provide additional insights either individually or in combination with your current data sources. The ability to “connect the dots” is a great way to summarize this aspect and was highlighted in a recent testimony by General Alexander, the head of the NSA. Typical additional data sources may be structured or unstructured feeds that help to uncover patterns and provide additional insights. The typical example here is obviously feeds from social media which help provide input on customer sentiment, but which could additionally be leveraged for aspects such as uncovering new product or service opportunities, operational efficiency improvements, market pricing dynamics, and customer satisfaction.
Four options for tapping into additional data from new and existing sources
For every business scenario, your needs will vary, and you’ll want to carefully pick and choose between the various options listed above. For example if we look at historical data, in financial services, access to more historical data can give analysts more information to feed into their predictive models. In healthcare, electronic medical records (EHRs) are capturing massive amounts of clinical data that can be mined for information to improve preventative care and disease treatment. For the typical enterprise this storage of historical data raises important questions about how much data to keep and how much may be useful and relevant for future needs. An addition issue, related to backward compatibility, is ensuring the historical data is stored appropriately so it can be easily accessed in the future. This issue was discussed recently in a Computerworld article covering a speech by Vint Cerf.
In terms of collecting and analyzing more real-time data, this obviously has many scenarios in public sector (e.g. crime prevention, intelligence, traffic flows etc.) and in health care, manufacturing processes and so on. It can also apply to help improve revenues and operational efficiencies in retail where items such as real-time sentiment analysis, sales analysis, and supply chain monitoring can better inform decision makers. A good example is P&G’s business sphere which is an “integration of technology, visualization, and information that enables leaders to drill-down into data to get answers in real-time”. According to P&G, “one supply chain example leveraged supply chain sufficiency models to bring together multiple data points, analytics, and visualizations. This resulted in an inventory reduction of 25% and savings of tens of millions of dollars”.
When we think about more frequent or higher resolution data, this could be a finer sampling rate in terms of a time series or perhaps a higher resolution graphical (spatial) model. As an example, if you’re scheduling work crews on construction sites across a large city such as Chicago, having access to more granular weather forecast data may enable you to keep more crews productive in periods of stormy weather by knowing exactly where the weather may be good or bad.
Finally, tapping into additional data sources to connect the dots and glean new insights is the most commonly cited example around big data analytics. It’s also perhaps the hardest challenge to solve in terms of knowing what to look for. It’s relatively easy to analyze your existing data sources since you already have those feeds and know how to process and interpret them. With new feeds from multiple data sources, the challenge is in knowing where to connect the dots and what the uncovered patterns may actually be telling you. I’m a strong believer that solving these challenges takes more than data scientists alone – it takes a multi-disciplinary team of IT staff, data scientists, and industry domain experts to put in place the right technical underpinnings, including suitable visualization technologies, and then iteratively analyze and interpret the information.
Hopefully, these four options for tapping into additional data from both new and existing sources may get you thinking about how you might further utilize your existing data in addition to the wealth of new data sources you now have access to.