How big is your big data?

I mentioned last time that I would be talking about snapshots and data protection. And so I shall! But snapshots cover a lot of ground, and I prefer to focus on particular aspects of them, rather than painting a high-level story that many are familiar with.

But let’s at least establish some groundwork. Snapshots are effective for data protection because they are fast and allow you to capture and restore data quickly. They also let you deal with very large amounts of data.  In fact, they may be the only way to do so.

When you see snapshots in action compared to conventional backup, it can seem magical. But they are not magic. A lot of work has to go to into making them truly useful. We’ll get to that in time, but today I want to focus on the amounts of data IT departments are dealing with as a first step in understanding why snapshots are the most effective in protecting ‘big data.’

We’ve all heard the term “big data” by now. In fact, you might be just a bit tired of it – like “cloud” and “flash,” every tech pundit is weighing in on it. (Personally, I think they should have called it “Brobdingnagian Data,” but that’s just me.)  Mostly, big data discussions are about analytics, but in the backup world it’s about quantity. “Lots of data” might be a better description. Or even “a ridiculous number of files.”

So how big is “big” in a backup context? Let’s take a little side road. Trust me, we’ll get somewhere!

IT departments are dealing with files. Many, many files. So are end users. But from a personal user standpoint, things are manageable. Sure, you’ve probably got too much unneeded junk on your laptop. But if you spent an hour or two doing some housecleaning, you could rid yourself of a lot of files you don’t need. Ever do that? You probably end up tossing out a few hundred files, maybe a thousand at the most if you’re really thorough. (I’m not counting de-installing programs here, I mean culling files individually.)

For our discussion, we’ll say you deleted a thousand files from your laptop. Whew, a job well done! But what if you were facing ten thousand unwanted files? Or a hundred thousand? Or a million? Or even a billion? What does that really mean?

It means big trouble is what it means. My company has been doing some research around this topic, and we’ve been hearing a steady stream of complaints from IT users at big organizations that they can’t get a handle on file management. They know their disk systems are full of junk, but they can’t identify it easily and don’t necessarily have policies around it even if they could find it. And some of these organizations have file counts into the billions.

A “billion” is a very interesting number. Even though we hear it tossed around everywhere (this or that will cost so many billion dollars), intuitively we can’t really grasp the scale of what a billion means. This is where the concept of “order of magnitude” can be helpful.

Simply put, an “order of magnitude” difference between two numbers means adding a zero. So 10,000 is an order of magnitude more than 1,000. But that doesn’t do much viscerally for how you perceive these numbers. So let’s get visual.

Before I tasked you with deleting a thousand files from your laptop. But what does a thousand feel like – and I mean feel like – compared to a few orders of magnitude higher? Let’s look. 


Above is a simple PowerPoint graphic visually showing the difference between three orders of magnitude higher than your thousand deleted laptop files. Compared to 1,000,000 files, your thousand files are basically invisible. So that’s how much more three orders of magnitude is – a lot more.   And many organizations easily pass the million file mark.

Think about this in terms of backup. If you’re backing up 100,000 files right now and it’s a challenge, what happens when you go an order of magnitude higher? What happens is your backup software gives up and jumps off a cliff.  Think you’ll never have that many files? Well if you have 100,000 files today, at average growth rates of 50% a year you are just a bit over five years away from having a million files.

Yet some companies hit the billion file mark.  That’s several orders of magnitude more than a million. If you compared that visually then a million would look a thousand in the chart above.  

 A billion as a number is just incredibly huge. You can’t grasp it. It’s unnerving really, yet we act as if a billion were nothing. We use it casually. We know that a billion is a thing that Bill Gates has 66 of, in dollars. And we know that the United States budget deficit will be 1,100 billion this year.  Oh wait, a thousand billion is a trillion. What’s a trillion look like? 


Yup, in the context of a trillion, even the gigantic, ungraspable billion becomes invisible. Incidentally, the total United States debt now exceeds sixteen trillion.  If you could really grasp that number – grasp it in your stomach, as it were – you’d never get out of bed in the morning. If I had kept the scale of my chart as it was in the first image, the trillion bar would go up through your computer monitor, through your ceiling and end up above your roof somewhere.

I think my point here is clear. Big numbers are easy to say, but when you see them visualized you realize how daunting they are.

Getting back to data protection, the difference between having to deal with thousands of files versus tens of millions of files is a Brobdingnagian difference. It changes everything about what you need to do. And there is no way your traditional backup software is going to handle files at this scale. Snapshots are your only option. But there again, they may be magical, but they aren’t magic, and you need to know just what it is you’re getting into.  

