Computerworld - The need to scrub data is made pretty clear by simple questions like this one: Are Jerry L. Jonson of 16 Clarke St., Altuna, PA, and Gerry L. Johnson of 16 Clark Street, Altoona, Penn., the same person? You would probably say that most likely they are. But a computer, without help from specialized software, would deal with the information as though it were about two different guys.
The human eye and mind recognize that the differences between the two sets of data records are probably the result of mistakes or inconsistencies in data entry. Weeding out and fixing or discarding inconsistent, incorrect or incomplete data is what's called data scrubbing or cleansing.
Credit: Melinda Beck
The issue of data hygiene has become increasingly important as more and more corporations implement complex customer relationship management (CRM) systems and build data warehouses that merge information from many different sources.
Sources of Dirty Data
In its 2001 report about organizations implementing data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data:
Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.
Data missing from database fields.
Lack of companywide or industrywide data coding standards (a big problem in health care, for example).
Multiple databases scattered throughout different departments or organizations, with the data in each structured according to the idiosyncratic rules of that particular database.
Older systems that contain poorly documented or obsolete data.
- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- Logicalis eBook: SAP HANA: The Need for Speed Without timely business insights, organizations today can suffer logistical, manufacturing, and even financial disaster in a matter of minutes
- Going Paperless? Here's What You Need to Think About As makers of some of the world's most popular PDF solutions, we often consult with businesses & governmental agencies that have the goal...
- The Big Data Opportunity for HR and Finance If CEOs, CFOs, CIOs, and CHROs want to drive their businesses forward, they will need to quickly recognize the enormous value of big...
- The New Business Case for Video Conferencing: 7 Real-World Benefits Beyond Cost-Savings This whitepaper provides insight into the value of video conferencing in today's business environment, and how organizations are using visual collaboration to find...
Transforming Finance, Procurement and Supply Chain Effectiveness with Cross-Functional Analytics
Date: May 6th, 2014
Time: 1 PM EDT
Attend this Webcast to find out how Oracle's packaged analytic applications enable line-of-business managers to examine all...
- Unified Communications 101 What's the best way to implement a unified communications solution for your organization? Join independent networking expert, Ed Tittel, as he weighs the... All Business Intelligence/Analytics White Papers | Webcasts