Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Computerworld 2007Subscribe to Computerworld
40 years of the most authoritative source of news and information for IT leaders.

Data Scrubbing

 

Sign up to receive Security Resource Alerts

February 10, 2003 (Computerworld) -- The need to scrub data is made pretty clear by simple questions like this one: Are Jerry L. Jonson of 16 Clarke St., Altuna, PA, and Gerry L. Johnson of 16 Clark Street, Altoona, Penn., the same person? You would probably say that most likely they are. But a computer, without help from specialized software, would deal with the information as though it were about two different guys.

The human eye and mind recognize that the differences between the two sets of data records are probably the result of mistakes or inconsistencies in data entry. Weeding out and fixing or discarding inconsistent, incorrect or incomplete data is what's called data scrubbing or cleansing.

Data Scrubbing
Credit: Melinda Beck
"Dirty data" has been a problem for as long as there have been computers -- or maybe for as long as people have attempted to gather and analyze information. It's a large part of the "garbage in" that can result in the worthless "garbage out" of a computing process.

The issue of data hygiene has become increasingly important as more and more corporations implement complex customer relationship management (CRM) systems and build data warehouses that merge information from many different sources.


More
Computerworld
QuickStudies


Without data cleansing, the IT staffs of those companies face the unappetizing prospect of merging corrupt or incomplete bits of data from multiple databases. A single piece of dirty data might seem like a trivial problem, but if you multiply that "trivial" problem by thousands or millions of pieces of erroneous, duplicated or inconsistent data, it becomes a prescription for chaos.

Sources of Dirty Data

In its 2001 report about organizations implementing data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data:

• Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.

• Data missing from database fields.

• Lack of companywide or industrywide data coding standards (a big problem in health care, for example).

• Multiple databases scattered throughout different departments or organizations, with the data in each structured according to the idiosyncratic rules of that particular database.

• Older systems that contain poorly documented or obsolete data.

Continued...
1 | 2 | NEXT  



Print this Story Send Us Feedback E-mail this Story Digg! Digg this Story Slashdot this Story
Data Cleansing
Data Scrubbing
"Pop quiz: What do RedPrairie, FrontRange Solutions, Concur Technologies, Servigistics and Logility have in common? Find out from Don Tennant..." Read more...
"We don't need al-Qaeda to blow us up. We are perfectly capable of lighting the fuse ourselves, courtesy of our..." Read more...
Read more Business Intelligence posts or See all Blogs
Report: Google holds emergency meeting on revived Microsoft-Yahoo deal
Analysts: Partial Microsoft-Yahoo deal won't appease Icahn, investors
YouTube declines Sen. Lieberman request to remove terrorist-produced videos
More top stories...
Video: The Top 10 -- plus 1 -- funniest YouTube parodies of the presidential campaigns
Apple owns 66% of $1,000-plus retail market, NPD Group says
PayPal plugs cross-site scripting hole that sidestepped stronger security
Specialists have retrieved about 99% of the data on a disk drive on board the crashed space shuttle Columbia. Don't miss the photographs of the recovered drive.
These big ideas were supposed to revolutionize technology, but they never actually appeared. In a few cases, you'll be glad they didn't.
Nearly 20 years after the first Internet worm, Steven J. Vaughan-Nichols takes stock of the malware/anti-malware landscape and spotlights how the two sides are approaching the battle.
Though some thought it was released too soon, Mac OS X 10.5 has matured into a solid operating system, says reviewer Michael DeAgonia.
Reviews, analyses, how-tos, visual tours, hot issues and predictions about Microsoft's new OS.
Four years from now, the IT field will be a vastly different place. Will you be ready?
All Zones
Application Performance Zone
Enterprise-Class Security Zone
The File Data Management Zone
Grid Computing on Windows Zone
Security Management Zone
ITIL Best Practices Zone
The SAS Zone
Storage Virtualization Zone

Ads by TechWords

See your link here
Computerworld Report: Storage Gets Strategic
Download this Computerworld Report, free, compliments of HP.
(Source: Computerworld) Data Storage has emerged from the back room to become a key part of regulatory compliance, disaster recovery and strategic tecnhology plans. Learn more in this new this Computerworld report, a $49.95 value, available free for a limited time, compliments of HP.
Download this executive briefing download
Why SaaS is Vital to Email and Web Security
Why SaaS is Vital to Email and Web Security
Download this webcast, free, compilments of Webroot Software
Go to the webcast 
The Advantages of a Hosted Messaging Security Solution
Get this report now!
(Source: Microsoft Office Live Meeting) Messaging management is becoming more difficult thanks to the growing malware threat. At the same time, messaging system administrators are under enormous pressure to push their messaging infrastructures to do more than ever, including archiving messaging content for regulatory compliance, archiving to support legal discovery and for overall litigation support, providing services to a growing body of mobile users, and ensuring continuity by making the messaging system more reliable, and managing policies for message encryption.
Download this white paper go
White Papers
Read up on the latest ideas and technologies from companies that sell hardware, software and services.
Guide to Network Frontline Troubleshooting
Securing Financial Services Beyond the Perimeter
Meeting PCI Compliance with SonicWALL Global Management System
View more whitepapers 
SAS Information Management Kit

SAS is the leader in business intelligence and analytical software and services. Only SAS offers leading data integration, storage, analytics and business intelligence applications within a comprehensive enterprise intelligence platform. SAS gives 97 of the top 100 companies in the 2007 Fortune 500 THE POWER TO KNOW®.

Webcast: The Information Management Roadmap
Imagine high-quality data, cleansed, analyzed and delivered throughout your organization. Join Computerworld, IT visionary Thornton May and a panel of experts to learn how SAS® can help you make it happen.

View this webcast 
Research Report: Information Management Initiatives at Midsize and Large Organizations
See the top-line results of this Computerworld sponsored survey to see how IT and business leaders are handling information management implementation.

Download this report 
White Paper: Information Management: Better Information for Winning Decisions.
This white paper explains how the SAS Information Evolution Model aids companies in assessing how they use this information to make strategic decisions and drive business.

Download this white paper