Ads by TechWords

See your link here
Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Security
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
 

Data exposure: Using software to redact personal data from public documents

Algorithms, manual intervention are among the options used to clean up online documents

April 13, 2006 12:00 PM ET

Computerworld - The personal data of millions of U.S. residents may have been exposed by the public posting of official documents, and local governments are increasingly looking for ways to automate the process of cleaning up data being put online.

Among the solutions available is redaction software that allows government agencies to remove sensitive personal data from the online images of public records. The software, which is being used in at least two Florida counties now, works in much the same way antispam software does -- by using algorithms to analyze images for specific phrases or words.

Some vendors use multiple levels of automatic analysis, while others narrow down the number of documents likely to need redaction, then use human intervention to winnow the desired data and train the applications for improved automatic redaction.

“It’s a new technology, but a proven technology,” said Paul Miller, president of Aptitude Solutions Inc. in Casselberry, Fla. Aptitude Solutions provides its aiRedact software to Broward and Hillsborough counties in Florida, as well as to counties in other states.

The issue of removing sensitive information -- including Social Security numbers, bank account information, driver’s license data and personally identifying details -- from public documents is gaining attention in light of concerns from privacy advocates. They have argued that the number of public documents being posted online with sensitive data included could open the door for a wave of identity theft and fraud (see ” Data exposure: Counties across the U.S. posting sensitive info online”). To meet that concern, county officials across the nation are turning increasingly to software to remove that data.

Since finding information in scanned images is more complex than simply locating instances of unique words in a text file, the redaction of information can’t be done using traditional methods such as word-pattern analysis, according to Aptitude Solutions.

AiRedact automatically indexes and redacts images using algorithms that look for targeted numbers or words or seeking out related words in context -- adjacent words like “account number” or “Social Security number.” Once keywords are found, the software automatically redacts the information, Miller said. The software can also remove personal information by indicating a certain area on a scanned form for automatic redaction -- as long as the forms have a standard layout with information in fixed locations.

As the application looks for candidates for redaction from among millions of document images, several thousand pages are culled and analyzed individually by a person who can verify that the information should be redacted. As the pool of documents is reviewed, the software automatically adjusts to redact the remaining records based on the choices made manually, Miller said.



Additional Resources

Xerox
By using solid ink technology only from Xerox, you could save up to 65% by printing color for the cost of black and white. Enter for a chance to WIN a PhaserTM 8860 network color printer!
Microsoft
Save time and mitigate security risk. Deploy it now.
Sybase
In this white paper, IDC analyzes the role of next-generation mobile enterprise platforms as organizations seek a more strategic deployment of mobile solutions.

Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.

White Papers & Webcasts

Sustaining SOX Compliance: Best Practices to Mitigate Risk, Automate Compliance, and Reduce Costs
Since the adoption of SOX, much has been learned about IT compliance. Discover how to make SOX efforts more effective in "Sustaining Sox...  

Data Protection and Disaster Recovery with iSCSI and VMware
Data protection and disaster recovery are top of mind for any IT manager, and the challenges of complexity and cost remain as obstacles....

IDC White Paper: CCM for IT Compliance and Risk Management
Learn from industry analysts how IT organizations are using configuration management to meet compliance requirements and instill best practices. Find out how these...  

Usability Is Everything
Learn what sets Workday's HR and Payroll solutions apart from the competition....

Keep it Clean: Maintaining the Integrity of your CMDB through Change Detection
Learn how configuration drift can challenge configuration management database (CMDB) integrity and how a configuration audit tool and an effective change management process...  

The Value of Real SaaS at Workday
Cost savings, speed to value, and innovation brought to the enterprise by Workday's software-as-a-service solutions for HR and Payroll....

The Tripwire HIPAA Solution: Meeting the Security Standards Set Forth in Section 164
HIPAA requires businesses that handle personal health information (PHI) to set up strong controls to ensure the security and integrity of that information....  

SaaS at Flextronics, Inc.
Dave Smoley, CIO of Flextronics, discusses the real value of software-as-a-service and why he chose Workday for his HR solution....

Configuration Assessment: Choosing the Right Solution
Configuration assessment lets businesses proactively secure their IT infrastructure and achieve compliance with important industry standards and regulations. Learn why configuration assessment is...  

Why Compliance Pays
This OnDemand webcast explores the relationship that firms with best compliance records have higher revenue, greater customer retention, lower financial losses from data...