Join the online discussion about this column.
I acquired some years of experience in database management in my former life as a programmer, more experience than I had ever hoped to get with databases. I was shooting for zero.
Be that as it may, it was because of my past experience that my sister recently asked me for help on some database issues. It was because of my distaste for databases that I daydreamed through the session. As I sifted through the mind-numbing details of her study guides, I found myself drifting into a world that could render the expertise of database administrators, data managers and countless others in the computer industry obsolete.
Let me rewind a bit to show you how I got there. My sister joined the dark side some years ago when she got her MCSE certification. Now she's studying for her Microsoft SQL Server certification tests and needed some input from someone familiar with SQL, namely me.
I first noticed some sample test questions about what data you can and cannot restore if a hardware failure occurs during a backup. These questions test your knowledge of things like transaction logs and the significance of something called the SQL Server primary data file.
While I can see the academic value of these questions, my answer would have been, "You can restore all of the data, because anyone who cares and has half a brain will use redundant storage such as RAID 5. As a side benefit, RAID gives you better performance."
As if the word performance triggered a posthypnotic suggestion, my mind then drifted to emerging technologies such as InfiniBand, which should eventually eliminate database performance bottlenecks such as the PCI bus. My sister snapped me back into reality with a question about query optimization.
One of the basic ideas behind SQL is that for any given question, the database should be able to deliver an answer at the same speed, no matter how you word your SQL query. That premise is true only in the land called Perfect SQL.
In Perfect SQL, you never have to hand-optimize a SQL query. But we don't have anywhere near Perfect SQL. So it is extremely important to learn how to optimize queries manually, which is why the process is covered in detail in the SQL Server certification study guides.
On the surface, it seems like optimizing a query is more art than science. But it really all boils down to one thing: disk access. Whether you're choosing how to index data, what data to index, the page size and the size of the cache and buffers or measuring "the number of I/Os" (I/O commands to the disks), database performance is still all about disk access.
That's when it hit me. It's not just query optimization; an incredibly huge portion of computer science boils down to disk access. Why do we care about the PCI bus bottleneck? We use PCI to get to disks. Why do we care about virtual memory performance? It's limited by disk-access speed. Why do we use Dynamic Link Libraries and shared libraries? Because disks are cheaper than RAM. Why do we use RAID? Because individual disks are slow and they fail.
When you come down to it, it is truly mind-blowing how much of our economy must be devoted toward working around the performance limitations and failure rates of disk storage.
Now imagine how everything would change if nonvolatile RAM were as cheap as disk storage and as fast as today's volatile RAM. That would make disks virtually useless. At least two-thirds of what database administrators know about optimizing queries would be irrelevant.
Indeed, it would simplify every form of data management beyond belief. The change would be almost as revolutionary as if Ford announced that it had developed a cheap automobile that ran on water and had no moving parts except the axles.
Why aren't companies working harder to make this dream a reality? To my knowledge, our best efforts have produced only ferroelectric RAM, which is arguably a breakthrough in nonvolatile RAM. But it's still slower than today's RAM and far too expensive to compete with disk storage.
Considering the potential benefits of getting rid of disks, I hope we see better, and see it in my lifetime. Data management would never be the same.
Nicholas Petreley is a computer consultant and author in Hayward, Calif. He can be reached at nicholas@petreley.com.
|
Taming Data Chaos
Stories in this report:
- Taming Data Chaos
- The Story So Far
- Merging Data Silos
- Beware of Data Overload from External Data
- Learn to Manage Data, Not Crises
- Data's Tower of Babel
- Extracting Dollars From Data
- Why ROI is so Elusive
- Collections of Data: Bases, Marts, Warehouses
- The Power of Location
- Seeding for Data Growth
- The Search is On
- The Data Designers
- Demise of the Disk Era
- Dawn of a New Database
- Keeping CFOs Happy
- Case Studies in Data Management
- Hot Issues: Scalability and Data Integration