A zettabyte by 2010: Corporate data grows fiftyfold in three years

Worldwide storage systems are outpaced by data growth

Over the past three years, Fortune 1,000 companies have on average seen their data grow from 190TB to 1 petabyte (1 million gigabytes), and data at America's 9,000 midsize companies has grown from 2TB to 100TB during the same period, according to a user study due out at the end of this month.

The 12-week study, performed by research firm TheInfoPro Inc. in New York, asked 400 data storage professionals how much storage capacity they have, how the capacity is allocated and the capacity's effective utilization rate, or how much of its capacity is actually housing data.

In all, the world has seen the amount of data grow from 5 exabytes (5 billion gigabytes) in 2003 to 161 exabytes in 2006, according to another study by IDC that came out this month. The world's storage systems can no longer store all of the data being created. This year, the amount of information created and replicated (255 exabytes) will surpass, for the first time, the storage capacity available (246 exabytes), IDC said.

Over the next three years, the world's data will increase sixfold annually and bring the total content count to 988 exabytes by 2010, while there will only be about 600 exabytes of capacity on storage systems, according to IDC's study, which was sponsored by data storage vendor EMC Corp.

So how much data is that? The Library of Congress, with 130 million items on about 530 miles of bookshelves -- including 29 million books, 2.7 million recordings, 12 million photographs, 4.8 million maps and 58 million manuscripts -- can be stored on 10 terabytes (10,000 gigabytes). So, the Library of Congress could be stored 161 million times over in the world's current data glut (see "From bits to yottabytes. How much data is that? ").

The data explosion means the role of IT managers will expand considerably as VoIP phones are joined to corporate networks, building automation and security moves to IP networks, surveillance goes digital, and RFID and sensor networks proliferate, IDC's study states.

IDC predicts that by 2010, while nearly 70% of the digital universe will be created by individuals, businesses of all sizes, agencies, governments and associations will be responsible for the security, privacy, reliability and compliance of at least 85% of that same digital universe. In 2006, just the e-mail traffic from one person to another (excluding spam) accounted for 6 exabytes (or 3%) of the world's data.

Notably, TheInfoPro's Wave-9 Survey of companies showed about 70% of corporate data is duplicates. TheInfoPro did not survey small companies or small home offices, the ranks of which represent 700,000 companies with revenue of $200 million or less. But Robert Stevenson, managing director at TheInfoPro, estimated those small companies have from 500GB to 1TB of data today, and that they are experiencing the same exponential data growth as larger companies. 

"That's why there's so much excitement around data de-duplication technology," Stevenson said. De-duplication technology eliminates copies of data, storing only one unique version.

According to IDC, most of the data today is being created by three major analog to digital conversions: film to digital image capture, analog to digital voice and analog to digital TV.

For example, two-year-old video-sharing Web site YouTube hosts 100 million digital video streams a day, and more than a billion digital songs a day are shared over the Internet in MP3 format. The CIO of Chevron Corp., John E. Bethancourt,  says his company accumulates 2TB of data every day.

There is so much data out there that 20% of Fortune 1,000 firms hand the responsibility for storing all that data off to a third-party managed storage provider, according to TheInfoPro.

Images captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe. They are replicated over the Internet, on private organizational networks, by PCs and servers, in data centers, in digital TV broadcasts and on digital projection movie screens, according to IDC.

Related News and Discussion:

Copyright © 2007 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon