2013 in review: Big data, bigger expectations?

hype cycle

Image: Olga Tarkovskiy (cc:by-sa)

It’s fair to say that big data has experienced more than its share of hype over the past year. According to Gartner’s 2013 Hype Cycle for Storage Technologies, big data is approaching its peak of inflated expectations, which means that it will soon be headed for the inevitable plunge into the trough of disillusionment.

But what caused these inflated expectations? Here’s my big-data year-in-review, plus four top tips for IT success in 2014.

In the early days (way back in 2010), the bold promise of big data was real-time data analysis. The story went something like this: Up and coming vendors would provide tools that could instantly sort through mountains of incoming data. The results would then be used for activities such as location-based consumer advertising, fraudulent credit card detection, or thwarting terrorist actions.

What made this possible, and what put the “big” in big data, was the fact that these analytics platforms could parse through monumental volumes of unstructured data from multiple sources. Such platforms would also quickly produce—in real time—important, actionable information.

In the parlance of the industry, big data’s feat was a result of the successful convergence of the “three Vs”:

Volume: A large amount of data

Variety: A wide range of data types and sources

Velocity: The speed of data moving from its sources, into the hands of those who need it

Although other Vs have since been contemplated, such as Veracity and Value, the original three attributes promised big data could go far beyond the boundaries of traditional databases, which require data to be stored in rigid rows and columns.

However, over the past year, reality began to sink in: People came to realize what big data could and could not do. Unfortunately, performing large-scale analytics in real time proved to be more daunting than originally thought. Although Hadoop continues to be the world’s most popular big data processing platform, it was designed for batch processing and is far too slow for real-time use.

This led to the concept of an analytics ‘stack’ which added map/reduce features and a NoSQL database. The goal of these additions was to create an efficient, searchable data structure where none had existed before.

The quest for the perfect analytics stack

In 2013, building an analytics stack for big data ultimately became a science project for most IT organizations. Much technological experimentation ensued.

Likewise, a multitude of offerings became available—from both the open-source side and the for-profit side. The Berkeley Data Analytics Stack is but one example. It consists of more than a dozen individual software elements designed to increase parallelism and reduce latency in order to produce true, real-time, decision-making results.

In a nutshell, big data is advancing, but at a slower speed than anyone expected. With a plethora of options and no clear reference architecture emerging, big data experimentation is sure to continue well into 2014.

Reassess what’s possible

2013 was the year that business and government realized the complexity and limitations of big data and began to make adjustments based on this realization.

Here are just a few examples:

Retail: Early on, retailers discovered that consumers objected to anyone getting too close to their personal information. This included their actual location. Technology limits aside, privacy concerns have dulled the appetite for on-the-spot advertising. Instead, retail companies are more likely to use big data analytics to develop profiles which are then used to build personalized customer loyalty programs.

Banking: Financial companies appear to have scaled-back their plans for using big data to fight consumer and credit-card fraud. Instead, as indicated in a recent McKinsey report, banks are using big data to sharpen their risk assessment and improve underwriting accuracy.

Fighting terrorists: As for thwarting terrorist activity, governments aren’t so keen on divulging their methods, but evidence suggests the following: While governments are good at creating haystacks of data, the most effective intelligence gathering is still done the old-fashioned way (i.e., talking to people, getting tips, and manual activity monitoring). For example, this former naval intelligence officer evaluates the role that “little” data will likely play in preventing terrorist activity in the future.

Back to basics?

As developments in 2013 have shown, achieving big data ‘nirvana’ will require some adjustments along the way. To avoid falling into your own trough of disillusionment, here’s some advice for the coming year:

1. Make sure your big data expectations are properly set, based on the current state of solutions in the market.

2. If embarking on a real-time analytics journey, recognize there will be many performance hurdles in front of you, in both hardware and software.

3. As part of your own big-data project, build in a prolonged period of experimentation. This should include independently testing the individual parts of the big-data stack.

4. Lastly, I recommend getting involved in community big-data discussions, like those listed in big-data.meetup.com.

This is the second of three posts reviewing 2013. In case you missed it, Part 1 was about enterprise flash storage. Next up: Cloud Computing.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon