Cerf sees a problem: Today's digital data could be gone tomorrow

A disk with its data may survive, but the ability to understand it may be lost

Computerworld Honors medal
WASHINGTON -- One of the computer scientists who turned on the Internet in 1983, Vinton Cerf, is concerned that much of the data created since then, and for years still to come, will be lost to time.

Cerf warned that digital things created today -- spreadsheets, documents, presentations as well as mountains of scientific data -- won't be readable in the years and centuries ahead.

Cerf illustrated the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. "It doesn't know what it is," he said.

"I'm not blaming Microsoft," said Cerf, who is Google's vice president and chief Internet evangelist. "What I'm saying is that backward compatibility is very hard to preserve over very long periods of time."

The data objects are only meaningful if the application software is available to interpret them, Cerf said. "We won't lose the disk, but we may lose the ability to understand the disk."

It's not just PowerPoint slides either, he said. The scientific community collects large amounts of data from simulations and instrument readings. But unless the metadata survives, which will tell under what conditions the data was collected, how the instruments were calibrated, and the correct interpretation of units, the information may be lost.

"If you don't preserve all the extra metadata, you won't know what the data means. So years from now, when you have a new theory, you won't be able to go back and look at the older data," said Cerf, who spoke about the issue at the CW Honors awards program Monday and in an interviewed afterward.

What's needed, Cerf said, is a "digital vellum," a means as durable and long-lasting as the material that has successfully preserved written content for more than 1,000 years.

Ensuring that people in future centuries have access to this data, is "a hard problem," he said.

Cerf and fellow computer scientist Robert Kahn the developed TCP/IP protocol used for the Internet. The full switch over to that protocol occurred in 1983.

If a company goes out of business and there is no provision for its software to become accessible to others, all the products running that software may become inaccessible, Cerf said. "There are hard, complicated technical and legal problems that will have to be resolved."

The problem is recognized and there are efforts internationally to address it. Cerf said he's been in meetings about this issue attended by 400 people.

"It may be that the cloud computing environment will help a lot. It may be able to emulate older hardware on which we can run operating systems and applications," he said.

To give the issue context, Cerf talked about the effort it took for historian Doris Kearns Goodwin to produce her book about President Lincoln's administration, Team of Rivals. She went to more than 100 libraries and found the correspondence written by Lincoln's cabinet and used it to reproduce what its members would have been talking about.

For the future, "we need a digital vellum that will preserve not only the bits, but a way of interpreting them as well," Cerf said.

This article, Cerf sees a problem: Today's digital data could be gone tomorrow , was originally published at Computerworld.com.

Patrick Thibodeau covers cloud computing and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov or subscribe to Patrick's RSS feed . His e-mail address is pthibodeau@computerworld.com.

See more by Patrick Thibodeau on Computerworld.com.

FREE Computerworld Insider Guide: IT Certification Study Tips
Join the discussion
Be the first to comment on this article. Our Commenting Policies