Computerworld - The need to scrub data is made pretty clear by simple questions like this one: Are Jerry L. Jonson of 16 Clarke St., Altuna, PA, and Gerry L. Johnson of 16 Clark Street, Altoona, Penn., the same person? You would probably say that most likely they are. But a computer, without help from specialized software, would deal with the information as though it were about two different guys.
The human eye and mind recognize that the differences between the two sets of data records are probably the result of mistakes or inconsistencies in data entry. Weeding out and fixing or discarding inconsistent, incorrect or incomplete data is what's called data scrubbing or cleansing.
Credit: Melinda Beck
The issue of data hygiene has become increasingly important as more and more corporations implement complex customer relationship management (CRM) systems and build data warehouses that merge information from many different sources.
Sources of Dirty Data
In its 2001 report about organizations implementing data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data:
Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.
Data missing from database fields.
Lack of companywide or industrywide data coding standards (a big problem in health care, for example).
Multiple databases scattered throughout different departments or organizations, with the data in each structured according to the idiosyncratic rules of that particular database.
Older systems that contain poorly documented or obsolete data.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- The value of smarter oil and gas fields With global energy requirements continuing to rise, the exploration, development and production of new oil and gas resources are shifting to increasingly challenging...
- Smarter Environmental Analytics Solutions: Offshore Oil and Gas Installations Example This IBM Redbooks® Solution Guide describes a solution for implementing smarter environmental monitoring and analytics for oil and gas industries. The solution implements...
- Piecing Together the Business Intelligence Puzzle Business intelligence (BI) technology collects and analyzes company data, delivering relevant information to corporate decision-makers in an effort to produce favorable outcomes.
- Harness IT -- An Introduction to Business Intelligence Solutions Learn the key selection criteria required to provide your organization with the capability to address structured data, unstructured data and mobile demands so...
- Live Webcast Increasing the Value of Your Reports and Dashboards Learn how incorporating other analytical capabilities such as predictive modeling and visualization can increase the value of your reports and dashboards by providing...
- The Software-Defined Data Center: Is your ADC ready? Data center transformation is accelerating beyond virtualization to next-generation cloud architectures and software-defined data centers, bringing new challenges for application performance, scalability and...
- Application Acceleration: Optimize the End-User Experience Watch this on-demand webcast and learn how you can optimize your web content, accelerate performance across any device and browser combination, and offload... All Business Intelligence/Analytics White Papers | Webcasts