Big data goes mainstream

A new group of data mining technologies promises to forever change the way we sift through our vast stores of data, making it faster and cheaper.

We've all heard the predictions: By 2020, the quantity of electronically stored data will reach 35 trillion gigabytes, a forty-four-fold increase from 2009. We had already reached 1.2 million petabytes, or 1.2 zettabytes, by the end of 2010, according to IDC. That's enough data to fill a stack of DVDs reaching from the Earth to the moon and back -- about 240,000 miles each way.

For alarmists, this is an ominous data storage doomsday forecast. For opportunists, it's an information gold mine whose riches will be increasingly easy to excavate as technology advances.

Enter "big data," a nascent group of data mining technologies that are making the storage, manipulation and analysis of reams of data cheaper and faster than ever. Once relegated to the supercomputing environment, big data technology is becoming available to the enterprise masses -- and it is changing the way many industries do business.

Computerworld defines big data as the mining of huge sets of structured and unstructured data for useful insights using nontraditional data-sifting tools, including but not limited to Hadoop.

Like the cloud, big data has been the subject of much hype and a lot of uncertainty. We asked analysts and big data enthusiasts to explain what it is and isn't, as well as what big data means to the future of data mining.

Setting the stage for big data

Big data for the enterprise has emerged thanks in part to the lower cost of computing power and the fact that the systems are able to perform multiprocessing. Main memory costs have also dropped, and companies can process more data "in memory" than ever before. What's more, it's easier to link computers into server clusters. Those three factors combined have created big data, says Carl Olofson, a database management analyst at IDC.

"We can not only do those things well, but do them affordably," he says. "Some of the big supercomputers of the past involved heavy multiprocessing of systems that were linked together into tightly knit clusters, but at the cost of hundreds of thousands of dollars or more because they were specialized hardware. Now we can achieve those kinds of configurations with commodity hardware. That's what has helped us be able to process more data faster and more cheaply."

Not every company with vast data warehouses can say it's using big data technology. To qualify as big data, IDC says, the technology must first be affordable, and then meet two out of the three criteria that IBM describes as the three V's: variety, volume and velocity.

Variety means data comes in structured and unstructured forms. Volume means the amount of data being gathered and analyzed is very large. And velocity refers to the speed at which the data is processed. It "isn't always hundreds of terabytes," Olofson says. "Depending on the use case, a few hundred gigabytes could be quite large because of the third dimension, which is speed or time. If I can perform an analytic process against 300GB in a second, and it used to take an hour, that greatly changes what I can do with the results, so it adds value. Big data is the affordable application of at least two out of three of those."

1 2 3 4 5 Page 1
Page 1 of 5
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon