Demise of The Disk Era

Join the online discussion about this column.

I acquired some years of experience in database management in my former life as a programmer, more experience than I had ever hoped to get with databases. I was shooting for zero.

Be that as it may, it was because of my past experience that my sister recently asked me for help on some database issues. It was because of my distaste for databases that I daydreamed through the session. As I sifted through the mind-numbing details of her study guides, I found myself drifting into a world that could render the expertise of database administrators, data managers and countless others in the computer industry obsolete.

Let me rewind a bit to show you how I got there. My sister joined the dark side some years ago when she got her MCSE certification. Now she's studying for her Microsoft SQL Server certification tests and needed some input from someone familiar with SQL, namely me.

I first noticed some sample test questions about what data you can and cannot restore if a hardware failure occurs during a backup. These questions test your knowledge of things like transaction logs and the significance of something called the SQL Server primary data file.

While I can see the academic value of these questions, my answer would have been, "You can restore all of the data, because anyone who cares and has half a brain will use redundant storage such as RAID 5. As a side benefit, RAID gives you better performance."

As if the word performance triggered a posthypnotic suggestion, my mind then drifted to emerging technologies such as InfiniBand, which should eventually eliminate database performance bottlenecks such as the PCI bus. My sister snapped me back into reality with a question about query optimization.

One of the basic ideas behind SQL is that for any given question, the database should be able to deliver an answer at the same speed, no matter how you word your SQL query. That premise is true only in the land called Perfect SQL.

In Perfect SQL, you never have to hand-optimize a SQL query. But we don't have anywhere near Perfect SQL. So it is extremely important to learn how to optimize queries manually, which is why the process is covered in detail in the SQL Server certification study guides.

On the surface, it seems like optimizing a query is more art than science. But it really all boils down to one thing: disk access. Whether you're choosing how to index data, what data to index, the page size and the size of the cache and buffers or measuring "the number of I/Os" (I/O commands to the disks), database performance is still all about disk access.

That's when it hit me. It's not just query optimization; an incredibly huge portion of computer science boils down to disk access. Why do we care about the PCI bus bottleneck? We use PCI to get to disks. Why do we care about virtual memory performance? It's limited by disk-access speed. Why do we use Dynamic Link Libraries and shared libraries? Because disks are cheaper than RAM. Why do we use RAID? Because individual disks are slow and they fail.

When you come down to it, it is truly mind-blowing how much of our economy must be devoted toward working around the performance limitations and failure rates of disk storage.

Now imagine how everything would change if nonvolatile RAM were as cheap as disk storage and as fast as today's volatile RAM. That would make disks virtually useless. At least two-thirds of what database administrators know about optimizing queries would be irrelevant.

Indeed, it would simplify every form of data management beyond belief. The change would be almost as revolutionary as if Ford announced that it had developed a cheap automobile that ran on water and had no moving parts except the axles.

Why aren't companies working harder to make this dream a reality? To my knowledge, our best efforts have produced only ferroelectric RAM, which is arguably a breakthrough in nonvolatile RAM. But it's still slower than today's RAM and far too expensive to compete with disk storage.

Considering the potential benefits of getting rid of disks, I hope we see better, and see it in my lifetime. Data management would never be the same.

Nicholas Petreley is a computer consultant and author in Hayward, Calif. He can be reached at





Companies are adopting a staggering amount of disparate business intelligence (BI) technologies, adding to BI fragmentation in organizations. Most BI is implemented in departments on an as-needed basis, but there should be an overall plan.


Better data-mining models are needed to handle ever-larger data warehouses in the 100TB range.


Algorithms for predictive modeling should become more cost-effective and usable by Java and C++ programmers.


By 2005, BI and other technologies may converge to create a market for real-time business activity monitoring, or BAM.

Sources: Meta Group Inc. and Gartner Inc., Stamford, Conn.

Technical Headaches

The top technical challenges for data warehouse projects in large organizations:

1 Security
2 Performance/scalability
3 Populating the data warehouse
4 Availability
5 Consistent data standards

Base: Survey of 264 IT managers at North American companies with 1,000 or more employees

Source: IDC, Framingham, Mass., December 2001


DAMA International Symposium


Topics: Metadata, XML, data modeling, data integration, business rules, data administration, objects and components, BI and data warehousing.

Speakers include Ed Yourdon, Bill Inmon, John A. Zachman and Peter Aiken.

Enterprise Analytics & Data Warehousing Conference


Topics: Building a better data warehouse; using BI and enterprise analytics to examine, measure and improve operations.

Special Report


Taming Data Chaos

Stories in this report:


Copyright © 2002 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon