The Data You Don't Even Know You Have

Let's assume for a moment that Google's collection of Wi-Fi "payload" data really was unintentional. And that Google never used the data, didn't even know it was there and stored it securely. Is it actually a privacy leak if no one has looked at the private data?

That, of course, is a question for lawyers and courts and government regulators.

Now consider: What happens when the various lawyers and courts and government agencies around the world investigating Google's Wi-Fiasco demand to look at the data Google collected?

It becomes a privacy leak.

After all, in order to see whether there's any personally identifiable or sensitive data in Google's big pile o' Wi-Fi data, somebody has to invade the privacy of the people whose data has been collected. And the prime candidates are litigators, politicians and bureaucrats.

Isn't that comforting?

This isn't a defense of Google. Google failed. That has become clear in the month since the search giant admitted that the cars collecting photos for its Street View feature were also collecting samples of the Wi-Fi signals near the cars -- and those samples contained actual snippets of any unencrypted Wi-Fi network traffic.

Google failed in many ways, but perhaps its biggest failure was this: It collected data it didn't know it was collecting. That may sound innocuous. It's not. It's an epic failure of good data-management policy.

Look, if you don't know what you've got, you can't manage it. You can't keep it secure or retain it as long as you legally should, because you don't know which rules apply.

Google spent years collecting it-didn't-know-what. Now it will be paying the price for years -- in bad publicity, investigations and lawsuits.

So -- what is your IT shop collecting?

You probably know whether you have Social Security numbers in your databases. If you're doing credit card transactions, you know the PCI rules. If you're in health care or insurance, you're careful to follow HIPAA privacy regulations.

But what about the stuff that's outside your industry's standards for collection of information? Are you sure you even know the kinds of data that routinely come in?

Do you audit the forms on your Web site to make sure you know about everything you gather from customers and sales prospects? Remember, even trivial or "anonymized" data can be mixed and matched to personally identify individuals.

What about the sales staff's notes, or information collected by customer service reps? Yes, you want them to take notes on anything that might be useful in making a sale or solving a problem. But some customers are dangerously chatty with their private information. Do you routinely purge information that's not relevant to your business?

And once you've cleaned up your official data sets, how do you handle all the copies employees made before the cleanup, in the form of backups, paper reports and data subsets that users made for their own projects?

No, you're not Google. You're not cruising the streets, plucking random network conversations out of the air.

But make sure you know what you have collected. And remind your employees -- both inside and outside IT -- that there can be a high price to pay for storing the wrong information.

If they doubt that, just tell them to Google it.

Frank Hayes has been covering the intersection of business and IT for three decades. Contact him at

Copyright © 2010 IDG Communications, Inc.

Shop Tech Products at Amazon