Big data storage doesn't have to break the bank

The era of big data requires new storage strategies. And with faster and smarter technology, these approaches don't have to break the bank.

1 2 3 4 Page 3
Page 3 of 4

"We have a column in the database called 'State' on every single person's record." But in a database of 300 million registered voters, "it only appears in our database 50 times," he says. "In [row-based open-source relational database management systems like] Postgres and MySQL, it appeared 300 million times. So if you replicate that level of compression on everything from street names to the last name Smith, that plus other compression algorithms buys you tremendous savings in terms of storage space. So your choice of database technology really does affect how much storage you need."

On the storage side, deduplication, compression and virtualization continue to help companies reduce the size of files and the amount of data that is stored for later analysis. And data tiering is a well-established option for bringing the most critical data to analytics tools quickly.

Solid-state drives (SSD) are another popular storage medium for data that must be readily available. Basically a flash drive technology that has become the top layer in data tiering, SSDs keep data in very fast response mode, Csaplar says. "SSDs hold the data very close to processors to enable the servers to have the I/O to analyze the data quickly," he says. Once considered too expensive for many companies, SSDs have come down in price to the point where "even midsize companies can afford layers of SSDs between their disks and their processors," says Csaplar.

Clouds Rising

Cloud-based storage is playing an increasingly important role in big data storage strategies. In industries where companies have operations around the world, such as oil and gas, data generated from sensors is being sent and stored directly to the cloud -- and in many cases, analytics are being performed there as well.

"If you're gathering data from 10 or more sources, you're more than likely not backlogging it into a data center" because that isn't cost-effective with so much data, says IDC's Nadkarni.

GE, for instance, has been analyzing data on machines' sensors for years using "machine-to-machine" big data to plan for aircraft maintenance. Campisi says data collected for just a few hours off the blade of a power plant gas turbine can dwarf the amount of data that a social media site collects all day.

Companies are using the cloud to gather data and analyze it on the spot, eliminating the need to bring it into the data center. "Companies like Amazon give you a compute layer to analyze that data in the cloud. When you're done analyzing it, you can always move it from, say, the S3-type layer to a Glacier-type layer," Nadkarni adds.

Glacier is an extremely low-cost storage option that Amazon Web Services announced earlier this year. It's designed for keeping data "on ice" for decades. Other companies are introducing similar cloud-based archiving services, says Csaplar, noting that these offerings are professionally managed at a very reasonable price and could, for example, serve as the ultimate resting place for old tapes.

With prices as low as pennies on the gigabyte, it's hard to pass up. "As long as your data is scrubbed and doesn't have any sensitive information, you can dump it into this kind of archive and reduce your data center footprint," says Nadkarni.

1 2 3 4 Page 3
Page 3 of 4
7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon