June 22, 2004(Storage Networking World)
My wife is a volunteer archivist here in the town of Londonderry, N.H. A few nights ago, we were going over some of the town's historic documents. They were part of a recently discovered cache of historic artifacts recovered from the attic of a house built just after the Revolutionary War.
Mixed into the sheaf of thin, yellowish-brown pages were letters, proclamations, debt notices and arrest warrants, some of which dated back to the early 18th century. They were handwritten in a script and syntax that gave one pause while reading. But they were intelligible. Memorialized in this fragile collection were the famous and truly infamous members of a small New England community circa 1750.
As I was reading, I thought back to my first PC and to the collection of old floppy disks I have sitting in a box in our attic. Why am I saving them? I haven't a clue. I'd have to carry them down to the computer museum in Boston if I ever wanted to actually see what's on them. Even so, there's no guarantee that they will have the right version of WordPerfect or run the right version of DOS on a machine with a 5.25-in. floppy drive.
Nor can I assume that the data on the floppies themselves is still intact (given the summer-time heat up there, I'm sure it isn't). When it comes to actually preserving one's thoughts, paper and pen still beats all things digital.
I recently attended a conference that focused on emerging storage R&D requirements. One of these was the perceived need for "long-term" storage - storage that could preserve data for at least the period of time required by the latest government agency regulation du jour. It turns out that preserving digital objects for more than 10 years after creation is an exceedingly complex topic. Here's a quick walk through the issues involved.
Media degradation and data migration
The storage industry is jumping, en masse it would seem, into the use of SATA disk for long-term archival storage applications. As an industry, we might want to think this through a bit more.
Anecdote 1
Here's a quote from a recent paper authored by Gordon Hughes of the University of California at San Diego's Center for Magnetic Recording Research. Hughes compares the thermal decay of Fibre Channel/SCSI with SATA:
"Emodern [FC/SCSI] drives are designed for five-year data life against thermal decay (10 years is the typical nominal design, for 2X MTBF margin). The size of each bit at 60G-100G bytes/platter [SATA density] is so small that simple Boltzman kT thermal energy at room temperature slowly disorders bit "0" vs. "1" magnetization states, turning stored data bits into magnetic noise. It's a hard physics-based limitation and the subject of major technology conferences."
Let's not kid ourselves. Fat, slow disk is good for short-term archival storage applications. For longer term, one needs to either buy a SATA system that understands its own shortcomings and can compensate for them or default to the old stand-by - tape. But even tape becomes more unreliable as it ages, causing storage administrators to continually refresh the media. Bottom line: Whether it be disk or tape, users must have a viable strategy to escort archive data through time by migrating it from one form of media to the next - without dropping bits along the way.
Hardware platform obsolescence (and data migration again)
We all know why a piece of hardware costs more to maintain over time, right? Replacement parts get scarcer and more expensive, pushing the cost to maintain an aging box into luxury-item status.
Anecdote 2
Recently, I ran into an old friend who is now in the used-hardware business. He just closed a deal on his entire inventory (100-plus units) of a certain 40MB hard drive. He got $600 each. That's $67 per gigabyte used, three times the cost of new, high-performance disk. He sold them to a third-party maintenance company, and he was pumped. They were destined for a date with the crusher.
So, both vendors and users agree that it's time to retire aging war-horse hardware at the ripe old age of - what, five years? And then it's time to do the migration dance once again. The three most important words in data storage longevity are migrate, migrate and migrate. Continuous data migration is only one of the hard storage truths of regulatory compliance. But I don't need to lecture storage professionals any more on the subject of data migration. 'Nuf said.
Semantic continuity
This is the biggie. Assume you could design hardware and media that would last forever. Even so, you wouldn't get halfway to long-term data preservation because our friendly operating systems and applications vendors are busy, around the clock and around the world, designing newer, faster, better, cheaper software. They're purposely obsolescing the very code we used to create the archived data and the code we would need to render the archived data back into intelligible form in the future.
So even if you were able to get every single bit off a disk platter or tape strip you put down 10 years ago, there is a better than even chance you wouldn't be able to make sense of those bits.
Anecdote 3
The Harvard University Library has two archival storage facilities, one analog (papers, books, photos, microfiche, etc.) and one digital, known as the On-line Computer Library Center (OCLC). Here's how OCLC charges users for its services:
OCLC's current prices are for "bit preservation" services only. These include data management and backup, ongoing virus and fixity checks, periodic media refreshment, disaster recovery and support of administrative tools for owners to update metadata and generate reports. Prices have not yet been set for "full preservation," wherein OCLC would be obligated to provide standard bit preservation services, plus the capability to render intellectual content accurately, regardless of technology changes over time.
"Full preservation" - that's the really hard part. You can have your bits back anytime you want. But do you want them back 10 years hence in the same format and with the same informational content as before? Well, we're workin' on that. Another hard storage truth of regulatory compliance is that, ultimately, it's not a storage problem.
Can we get a century or two, at least?
In Bejing, Microsoft has a lab - an inventions incubator actually - called Microsoft Research Asia that's been recently written up in a number of science and technology journals. There, an inventor named Jian Wang is working on a digital pen.
Cool stuff, this digital pen. It sports a digital camera and pressure sensor, on-board memory and wireless connectivity to, for example, your laptop. Microsoft is, of course, (no surprises here) also hard at work, continually developing and refining the software that translates handwriting into something a digital brain can process.
The digital pen is a wonderfully universal invention capable of digitally capturing the thoughts and emotions of any member of any culture using any language. However, if the human intelligence and emotion it records can't be preserved like pen and ink can preserve the drawings of Leonardo, then - thank God - not everything is "born digital." However, if the digital pen could be both universal and stand the test of at least a century or two - then Jian Wang's invention would be truly monumental.
John Webster is senior analyst and founder of the Data Mobility Group. He can be reached at jwebster@datamobilitygroup.com.