Computerworld - The need to scrub data is made pretty clear by simple questions like this one: Are Jerry L. Jonson of 16 Clarke St., Altuna, PA, and Gerry L. Johnson of 16 Clark Street, Altoona, Penn., the same person? You would probably say that most likely they are. But a computer, without help from specialized software, would deal with the information as though it were about two different guys.
The human eye and mind recognize that the differences between the two sets of data records are probably the result of mistakes or inconsistencies in data entry. Weeding out and fixing or discarding inconsistent, incorrect or incomplete data is what's called data scrubbing or cleansing.
Credit: Melinda Beck
The issue of data hygiene has become increasingly important as more and more corporations implement complex customer relationship management (CRM) systems and build data warehouses that merge information from many different sources.
Sources of Dirty Data
In its 2001 report about organizations implementing data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data:
Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.
Data missing from database fields.
Lack of companywide or industrywide data coding standards (a big problem in health care, for example).
Multiple databases scattered throughout different departments or organizations, with the data in each structured according to the idiosyncratic rules of that particular database.
Older systems that contain poorly documented or obsolete data.
- Software Asset Management: Ensuring Today's Assets Today's trends like BYOD and SaaS are new and exciting in terms of how they will help make our jobs more productive but...
- Trends Shaping Software Management: 2014 Most IT executives recognize the relationship between mobile computing and worker productivity, and have long issued notebook computers and other mobile devices to...
- Software Asset Management: Pay Attention or Pay Up There is a wide range of options for managing software assets, from in-house solutions to the cloud to managed services providers. Read this...
- 13 Reasons to Move to Adobe Creative Cloud One of the big advantages Adobe Creative Cloud for teams offers over Adobe Creative Suite 6 perpetual software is the ability to continually...
- Capturing Data in Motion: Delivering Real-Time Insight from Data Streams This webcast will help organizations of all types and sizes learn about a technology and business strategy for tapping into the wealth of...
- The Next Generation of Big Data: New IBM Information Management Cloud Solutions Learn about IBM's new and expanded Information Management capabilities now delivered in the cloud, including: Hadoop based analytics, stream processing, in-memory computing, data... All Business Intelligence/Analytics White Papers | Webcasts
Our new bimonthly Internet of Things newsletter helps you keep pace with the rapidly evolving technologies, trends and developments related to the IoT. Subscribe now and stay up to date!