If somebody asked you to do the exact same work over and over again, would you think that was a smart thing to do? Of course not. But that’s exactly what many of us are doing in our backup environments.
There are a lot of technology approaches to backup, and all of them have to deal with ever increasing amounts of data. But they are not all equally smart. In fact, when you look at them a certain way they can be downright stupid. And while “Dumb and Dumber” may have been quite popular as a movie, it shouldn’t serve as an approach to backup.
There are three major types of backup done today.
· File backup to tape
· File backup to a deduplication appliance
· Host-side data reduction
I am keeping disk-to-disk snapshots out of this discussion and focusing on traditional server-to-target backup methods, still the most commonly used. So which of these are dumb and which is dumber? Let’s use a metaphor to explain.
I have a task for you. Imagine we are standing in a large room, about 100 feet from end to end. On one end of the room is a pile of 1,000 very heavy bricks, in ten unique colors, 100 bricks of each color. A new pile is delivered every day. Your job is to get one brick of each color across the room on a daily basis.
If you were backup-to-tape, you would start by moving all of the bricks across the room. Then you would leave the pile there. The next day you would move the new pile, and every day you would move all thousand bricks, an exhausting task requiring you to trudge back and forth many times. Eventually I will ask you for ten unique bricks, and then you will sort through the ever-growing piles and find them. Meanwhile, that side of the room is getting very crowded.
This is our dumber method. Effort and space are being wasted in ways that do nothing to help us find specific bricks. In fact, tremendous labor is spent to actually make the problem worse!
If you were a deduplication appliance, you would start the same way by carrying all thousand bricks across the room. But when you got there, you’d sort though them and throw away 990, keeping only the 10 unique bricks. And then every day you’d move all the bricks and sort all the bricks.
This method is dumb. Why are you doing all that carrying just to throw the bricks away? The good news is that you’re not taking up nearly as much space because the piles are so much smaller now, but you sure are tired every day.
Now if you were host-side data reduction you would start by sorting through the bricks in the original pile before you moved any of them. This might take some time and effort, but once you locate the ten unique bricks you can carry those easily in only a few trips, rather than spending hours and hours lugging excess bricks for no good reason. Your job is done in a fraction of the time and you’re not exhausted from the effort.
This is doing it smarter! The end result of each method is the same: when I ask you for a brick, you can get it for me. But by using host-side reduction you’re saving an enormous amount of work.
Now of course we’re not talking about moving bricks, we’re talking about moving data. But the analogy is exact. The carrying of the bricks is something I like to refer to as “data lift,” the work your servers and networks do to get the bricks – that is, the data – from the source to the target. Data lift creates impact exactly where you don’t want it, on your application servers and network. So it’s both dumb and dumber to continue employing backup methods that generate so much needless effort.
The conclusion is clear: eliminating redundant data – sorting out the bricks – at the source side makes the most sense. But there are technologically different ways to do this, which have their own smart and dumb aspects. We’ll look at that next time.