Believe it or not, your IT department is probably full of squirrels. No, not those cute fuzzy critters that climb trees, but data consumers that hide data away with the same relentless fortitude as their bushy tailed namesakes hide acorns.
I was inspired by this idea by Dave Russell, VP and Distinguished Analyst at Gartner. Dave is a long-time industry watcher and one of the smartest people around when it comes to understanding the data protection industry. I was in a meeting with him recently when he mentioned how IT departments tend to have lots of people in them who like to “squirrel away” copies of data. That got me thinking.
It’s certainly true that there are plenty of data consumers in your organization. I previously discussed an aspect of users making too many copies of data which considered the problem of users making too many copies of data. Many, but not all, of these users qualify as squirrels. What makes someone an IT squirrel?
An IT squirrel likes to hide data away somewhere but he doesn’t want to tell anybody where he put it. This way, it can stay nice and safely buried, usually lost in plain sight, just one more file out of millions or even billions stored across your organization.
Squirrels usually have something to do with databases, such as development work or reporting. They can never have enough copies of databases to work on, so when they get their hands on them they like to keep them, even when they are far too old to do any good. After all, you never know when you might need it (it might be an extra tough winter!)
Interestingly, when a squirrel hides away a copy of a database it actually changes from being structured data to being unstructured data: that is, just another file in the pile. And since unstructured data is the fastest growing pile in your organization, it’s hard to notice. And data squirrels often don’t have to pay a price for saving up database acorns. Paying for storage is somebody else’s problem, which is why storage administrators are often up late wondering where that next few terabytes of free space is going to come from.
But it’s not just databases. Copies of graphics files grow like weeds, often with only minor differences between them. Large project files, like CAD/CAM files or video compiles, get stored in countless iterations and stick around long after they are of any use. Some of this is needed, but a lot isn’t.
And unlike real squirrels which actually are able to find their hidden acorns, IT squirrels tend to be more forgetful. If you don’t actively need something you tend to forget it ever existed at all. Forget about finding it years later or even remembering you stored it away in the first place.
So what can you do about IT squirrels? Last time around I spoke about copy data management as a process, and indeed it can help. But it mostly helps going forward. It doesn’t clean up the mess you already have.
The problem is we don’t have good tools to manage unstructured data. For the most part, if your organization is large enough to have multiple NAS filers, you’re probably already storing enormous amounts of data that’s old and unneeded. It’s usually not that challenging to look at a single system and see the utilization rate on it. But knowing a box is 85% full doesn’t tell you anything about what’s taking up the space. The use of snapshots and replication for data protection, while highly effective at protecting information, only proliferate it even more. It’s entirely possible that between primary, snapshot and replica copies you can have dozens of instances of a given file, and you don’t actually need any of them.
And guess what? When a squirrel hides a file away in a folder, they may not even realize that it’s getting snapshot and replicated, taking up yet more storage across the organization.
There are tools out there for indexing systems, but many of them are very high impact or require enormous numbers of compute nodes to keep up with all the information. As a result, they aren’t very widely deployed, or they are targeted only at email archiving. Many IT departments just slog along. Deduplication and compression technologies certainly help, but the bottom line is that we’re storing far too many old acorns all over our systems. The best solution would be IT worker diligence, but realistically you can’t expect it given how overwhelmed IT staffs often are. Data cleanup is just not going to rise high up on the to-do list.
More efficient, automated ways of finding old data are needed. If we could solve that problem, then we can keep the IT squirrels happy and – sorry, but I can’t resist – we’d stop driving our storage administrators nuts.