Simply stated, the best way to incorporate Big Data into your information management strategy is by not thinking about Big Data at all, but by bringing it on-board because of business requirements and imperatives that coincidentally call for the use of Big Data.
There are two main triggers of the inclusion of Big Data in your plans:
As 'business as usual': Big Data needs are uncovered as part of the iterative requirements gathering and analysis process for a mature Enterprise Data Warehouse.
In response to major new business imperatives: As a response to shifts in the competitive landscape; as a way to exploit corporate strengths and market opportunities; as a means to mitigate risks, weaknesses and threats; and, to ensure that the business can theoretically respond to significant challenges, ones that require the formulation of a coherent, cohesive and realizable strategies.
Of course, there are other drivers, but we'll focus on these two.
Business as usual
Data Warehousing is a continuous cycle of quality improvement, and it is unique in the way that it can respond quite efficiently, effectively and rapidly to the changing data and information requirements of the organization.
As part of the continuous improvement of Data Warehousing, we are constantly responding to the expanding data and information requirements from existing and new Data Warehouse consumers.
Together with the business, we (the business data people) seek to identify new data requirements and following on from that, search out the most cost-effective sources of data to meet those business needs, to the quality criteria required. Sometimes if we have a solid business case for doing so, we can push data-use ideas to the business, but mainly we are responding to current and future needs.
At the same time, we should establish the value of having that additional data, and we should try to understand if we are investing or speculating. In addition, as part of this process, we should be getting a clearer idea of the acceptable ballpark cost of sourcing, processing and packaging and delivering the required data.
New Business Imperatives
In the past we have responded to the data requirements of strategic planning, budgeting and tactical firefighting by producing tons of printable hand-crafted reports, generating hundreds of spreadsheets and storyboarding many variations on the theme of "how best to react to a significant challenge". Fortunately, many businesses have adopted Data Warehousing, and those who have done it right already have a lot of built-in data-agility when it comes to many ad-hoc informational requirements.
Therefore, given the prevalence of Data Warehousing, it's unlikely that the crafting of a response to a strategic challenge will require the urgent incorporation of additional data from Big Data sources. In fact, what is more is more likely is that additional data comes from market data providers and other data brokers, rather from social-media sites, wearables and blogs.
Indeed, if we have in the past used Big Data and Big Analytics to verify experience and corroborate hunches, it will have given us globally applicable heuristics that, in theory at least, would be incorporated as business rules to enrich data going into the Data Warehouse.
So, when does Big Data come into it?
So how do we know when we have a Big Data requirement on our hands?
At the highest level of abstraction, we will intuitively know when we have a Big Data requirement, because we will be sourcing data from datasets that:
- Are relatively very large in size (for some this could be terabytes of data or to others it is closer to petabytes of data)
- Contains unstructured data, non-tabular structured data or very complex structured data.
- Are possibly produced over a relatively short time-period
- Cannot be satisfactorily processed using existing mainstream ETL or textual-ETL platforms
There is a plethora of technologies to support Big Data storage, data querying and counting. We may use one of the trendy open software applications or we may simply decide to use technology packaged with the operating system platforms. The choice of technology is about what works, the quality requirements and the cost-effectiveness. It should not be a major issue, and certainly not the main factor in a project.
Things to remember
If you don't have a question that is best answered by using Big Data then you don't need to do Big Data.
Use of Big Data – like any other data - should be the result of posing business questions that cannot be satisfied adequately with what we have available.
First, we need to understand the questions, identify options for answering those questions and then we can prioritize the options according to our needs and resources.
The key is data, and at the same time we must remember technology should not be the driver. This has to be a business imperative, or otherwise we are wasting time, using up finances better employed elsewhere and introducing disruption where its contribution is negative. Technology, including Big Data technology, is the enabler, not the driver.
Finally, here are three guidelines to consider:
- Don't ask 'how can I do Big Data?' but 'what data do we need?'
- You don't need to seek out Big Data. If you really need it, and it's available, and it's adequate and appropriate, then you'll be getting it soon enough.
- Avoid searching for a Big Data problem you don't have, which can only be solved by Big Data technology you don't need.
Many thanks for reading.