Data Scrubbing
Computerworld - The need to scrub data is made pretty clear by simple questions like this one: Are Jerry L. Jonson of 16 Clarke St., Altuna, PA, and Gerry L. Johnson of 16 Clark Street, Altoona, Penn., the same person? You would probably say that most likely they are. But a computer, without help from specialized software, would deal with the information as though it were about two different guys.
The human eye and mind recognize that the differences between the two sets of data records are probably the result of mistakes or inconsistencies in data entry. Weeding out and fixing or discarding inconsistent, incorrect or incomplete data is what's called data scrubbing or cleansing.

![]()
Credit: Melinda Beck ![]()
The issue of data hygiene has become increasingly important as more and more corporations implement complex customer relationship management (CRM) systems and build data warehouses that merge information from many different sources.
Without data cleansing, the IT staffs of those companies face the unappetizing prospect of merging corrupt or incomplete bits of data from multiple databases. A single piece of dirty data might seem like a trivial problem, but if you multiply that "trivial" problem by thousands or millions of pieces of erroneous, duplicated or inconsistent data, it becomes a prescription for chaos.
Sources of Dirty Data
In its 2001 report about organizations implementing data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data:
Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.
Data missing from database fields.
Lack of companywide or industrywide data coding standards (a big problem in health care, for example).
Multiple databases scattered throughout different departments or organizations, with the data in each structured according to the idiosyncratic rules of that particular database.
Older systems that contain poorly documented or obsolete data.
data cleansing
Additional Resources



Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.
White Papers & Webcasts
Why BI is Ripe - Now! - For Businesses of Any Size
Download Now!
Rapid Implementation: The New Age of ERP
Download Now!
Consolidate Your Servers and Storage to Lower Costs with Oracle Database 11g
View this webcast!
Maximize ROI for Web Applications
Register for this webcast now!
IDC Research Report: The Business Value of Consolidating on Energy-Efficient Servers
Download this Resource Now!
WAN Optimization as a Managed Service: More than Network Cost Savings
View this Webcast Now!
HP Technology Guide for Scalable Business Solutions
Download This Resource Now!
Asia-Pacific Enterprise Network Solutions
Learn through this Webcast how your business can achieve reliability, performance and value in hard-to-reach locations within the Asia-Pacific region.


