Skip the navigation

Data exposure: Using software to redact personal data from public documents

Algorithms, manual intervention are among the options used to clean up online documents

By Todd R. Weiss
April 13, 2006 12:00 PM ET

Computerworld - The personal data of millions of U.S. residents may have been exposed by the public posting of official documents, and local governments are increasingly looking for ways to automate the process of cleaning up data being put online.

Among the solutions available is redaction software that allows government agencies to remove sensitive personal data from the online images of public records. The software, which is being used in at least two Florida counties now, works in much the same way antispam software does -- by using algorithms to analyze images for specific phrases or words.

Some vendors use multiple levels of automatic analysis, while others narrow down the number of documents likely to need redaction, then use human intervention to winnow the desired data and train the applications for improved automatic redaction.

“It’s a new technology, but a proven technology,” said Paul Miller, president of Aptitude Solutions Inc. in Casselberry, Fla. Aptitude Solutions provides its aiRedact software to Broward and Hillsborough counties in Florida, as well as to counties in other states.

The issue of removing sensitive information -- including Social Security numbers, bank account information, driver’s license data and personally identifying details -- from public documents is gaining attention in light of concerns from privacy advocates. They have argued that the number of public documents being posted online with sensitive data included could open the door for a wave of identity theft and fraud (see ” Data exposure: Counties across the U.S. posting sensitive info online”). To meet that concern, county officials across the nation are turning increasingly to software to remove that data.

Since finding information in scanned images is more complex than simply locating instances of unique words in a text file, the redaction of information can’t be done using traditional methods such as word-pattern analysis, according to Aptitude Solutions.

AiRedact automatically indexes and redacts images using algorithms that look for targeted numbers or words or seeking out related words in context -- adjacent words like “account number” or “Social Security number.” Once keywords are found, the software automatically redacts the information, Miller said. The software can also remove personal information by indicating a certain area on a scanned form for automatic redaction -- as long as the forms have a standard layout with information in fixed locations.

As the application looks for candidates for redaction from among millions of document images, several thousand pages are culled and analyzed individually by a person who can verify that the information should be redacted. As the pool of documents is reviewed, the software automatically adjusts to redact the remaining records based on the choices made manually, Miller said.



Additional Resources
Forrester Consulting - Optimizing Users and Applications in a Mobile World
WHITE PAPER
Solving application issues over the WAN requires careful consideration. Based on their independent research, Forrester Consulting offers recommendations on how to tackle application performance issues, insufficient bandwidth and the inability to quickly restore users in a disaster.

Read now.

Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Privacy White Papers
Overcome Top 7 Admin Challenges of Active Directory
As Active Directory's role in the enterprise has drastically increased, so has the need to secure the data. Gain insight on creating repeatable,...
Insiders Can Ruin Your Company. Take Action.
Did you know that 80 percent of threats to an organization come from the inside? The threat from insiders is often overlooked in...
Top Solutions and Tools to Prevent Devastating Malware
Custom malware frequently goes undetected. According to Forrester Research, the best way to reduce risk of breach is to deploy file integrity monitoring...
Streamline Compliance and Increase ROI
Streamline, simplify, and automate compliance related activities; especially those that impact multiple business units. This white paper from NetIQ, outlines solutions that will...
X-Ray of the PCI Process-4 Proactive Steps
This white paper from Forrester Research Inc., helps break PCI into understandable components. Security and risk professionals will gain knowledge and insight into...
All Privacy White Papers
Privacy Webcasts
A Road Map for Best Practice Social Media Acceptable Use Policy
Organizations around the world are racing to leverage the power of social media for business. Sites like Facebook are used for marketing, human...
Data Protection and Disaster Recovery with iSCSI and VMware
Get this on demand webcast now
Optimizing Networks for the Cloud
Join guest speaker, Rohit Mehra, IDC Director of Enterprise Communications Infrastructure, to explore current trends, discuss best practices for optimizing Data Center and...
Apps QuickStart Series Part 2: Designing and Deploying SQL Server on VMware vSphere
Download this webcast to learn about the design considerations for virtualizing SQL workloads, performance and scalability information and high-availability options, as well as...
Apps QuickStart Series Part 1: Designing and Deploying Exchange 2010 on VMware vSphere
Download this webcast to learn the virtual hardware design considerations for Exchange 2010, deployment using the building block approach, options for high-availability and...
All Privacy Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs