Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Computerworld 2007Subscribe to Computerworld
40 years of the most authoritative source of news and information for IT leaders.

Dirty Data Blights the Bottom Line

Data quality isn't a glamorous topic, but Companies ignore it -- especially for internal systems- at their financial peril.
 

Sign up to receive Security Resource Alerts

November 07, 2005 (Computerworld) -- When Nancy Rybeck was hired by Emerson Process Management six years ago, she was charged with salvaging a data warehouse that had been built to help the company better analyze customer activity. But after a thorough review, she opted to scrap it and start over. The warehouse, it seemed, was loaded with redundant and inaccurate data.


"The biggest reason [the earlier effort] had failed was data quality," says Rybeck, data warehouse architect at Austin-based Emerson Process Management, a global supplier of measurement, analytical and monitoring instrumentation and services. One major contributor to the failure was an assumption made by the group that launched the initial Microsoft Access-based effort: that sales entities all over the world would enter customer names and addresses in the same manner, regardless of whether they operated in the Asia-Pacific region, Europe or other areas in which Emerson does business. Cultural differences, combined with complications caused by Emerson's continuing growth through acquisition, resulted in numerous ways of entering quote, billing, shipping and other key data.












Dirty Data Blights the Bottom Line
Image Credit: Andrew Skwish

Emerson's problems with inaccurate data are typical across all industries. Through 2007, at least 25% of critical data within Fortune 1,000 companies will continue to be inaccurate, according to Gartner Inc. And only 34% of executives responding to a 2004 PricewaterhouseCoopers survey said they're very confident in the quality of their corporate data.


Although many businesses tend to think that data quality primarily affects customer-facing initiatives, the impact can be more profound on internal operations. "CRM initiatives fail, and companies get into trouble with the security and privacy of customer data," says Gartner analyst Ted Friedman. "But the big money being lost [because of poor data quality] is in internal operations."


Inaccurate financial reporting, uncollected receivables, overpayments, poor product specifications, excess inventory—the problems caused by inaccurate data are endless, and they all affect the bottom line.


Meanwhile, mounting regulatory compliance requirements dictate increased data vigilance. "You can have all the controls in place, but if your data's not accurate, your CFO will be signing off on inaccurate information," says Robert Lerner, an analyst at Current Analysis Inc.


Data quality initiatives have long languished in the shadow of sexier projects. But thanks to failed CRM and ERP efforts, compliance violations, costly supply chain inefficiencies and more, that's starting to change. Investments in data quality suites are growing at a rate between 12% and 15% annually, according to Gartner, and the market is starting to consolidate as it matures.

Protect Your Source


Tools that address data quality fall into a variety of categories, including data profiling software, which sifts data fields for duplication, missing information and other errors; data cleansing and matching tools, which parse data into discrete elements, clean it, standardize it in formats, and match and merge records; data enhancement tools, which enrich data by incorporating, for instance, third-party elements; and data monitoring tools, which ensure that data maintains a preset level of quality.


Some IT groups still rely on extract, transform and load functions to ready data from various applications for staging in warehouses, but experts say ETL's effectiveness is only as good as the data being transformed.


"ETL isn't the same thing as data quality; [the process] may have nothing to do with data cleanup," says Chad Wright, applications manager for business intelligence and CRM at Tewksbury, Mass.-based Avid Technology Inc. The provider of digital media creation products purchased tools from Firstlogic Inc. in 2001 as part of an effort to clean and match data between its new SAP CRM system and its legacy Onyx CRM system. (In September, Pitney Bowes Inc. announced its intention to acquire Firstlogic.)


Avid continues to use Firstlogic tools to validate customer master data from companies it acquires against its SAP masters to prevent duplication. It's also performing some data quality measures in real time: The IT group has developed a Web service that takes advantage of Firstlogic's IQ8 service-oriented architecture (SOA) to automatically capture shipping information and validate it against country-specific postal codes, accepting or correcting it at the point of contact. Avid is also running the vendor's Global Data Quality Connector for SAP, which allows real-time checks in the SAP environment during order processing.


"We saw benefit from doing real-time data quality functions, cleaning up the thousands of marketing leads that come into our systems every day," says Wright.


Emerson has adopted data quality tools from Landham, Md.-based Group 1 Software Inc., a Pitney Bowes subsidiary, to help profile, cleanse and merge records for its data warehouse. Given Emerson's global scope and acquisition strategy, cleaning data manually wasn't an option, says Rybeck.


Emerson's data warehouse is fed by numerous source systems from around the world. Contact information for quoting, billing and shipping is linked to associated transactional records. Duplicate records are then eliminated, and the data is merged using Group 1 tools, custom coding and manual review processes. Ultimately, says Rybeck, Emerson wants full use of its contact data to better anticipate customer needs and improve its service and marketing.


"The plan is to have this feedback loop be complete. In the past, we may have used the marketing information to get some business, but we've never followed through to see what the profitability was in a marketing campaign," says Rybeck.

Keeping It Clean


Since adopting data quality tools from Billerica, Mass.-based Trillium Software, printer manufacturer Oki Printing Solutions has been able to improve its marketing campaigns. It has also significantly reduced fees and fines in its distribution chain associated with bad contact information, says senior systems analyst Maggie Dominguez. Mount Laurel, N.J.-based Oki started using Trillium in 1999 to clean data it was moving from legacy systems to SAP and now uses it for handling consumer and end-user contact data.


Dominguez and her team started building a data warehouse less than a year ago to improve the company's analysis capabilities for functions such as sales projections. "We wouldn't have survived without the data quality tools," she says. "We would have ended up with huge quantities of data that would have been very hard to mine."


Although businesses would like to address their data quality problems once and be done, maintaining accurate data is an intensive, ongoing effort. A master customer file may be completely accurate on Friday evening but house numerous inaccuracies by Monday morning, without any interference. That's because data decays by itself: People are born and die, and they change names and addresses; companies go out of business or get snapped up. Further, there are many points of entry to enterprise data sources, and data is continually repurposed.


Key to maintaining data cleanliness is controlling who touches it, says Jeffrey Monica, manager for data quality at StorageTek, a subsidiary of Sun Microsystems Inc. StorageTek has been using tools from DataFlux Corp., a SAS Institute Inc. subsidiary, to cleanse data from more than 60 source systems worldwide for a customer data warehouse. "We want to give people the flexibility to use [the warehouse and associated marts], but we'll have control over the quality of the data so we can say we have a single version of the truth," Monica says.


StorageTek uses Informatica Corp.'s ETL tool to pull data from specific fields for data warehouse loading. It uses DataFlux tools to identify the most accurate record and to cleanse and standardize data. Though Monica says StorageTek still has significant data duplication, the company has thus far reduced 1 million records in its warehouse to around 200,000.


StorageTek began the warehouse effort three years ago but didn't buy data quality tools until a year ago, says Monica. "The good news is we recognized we needed them, but the bad news is we didn't do it on Day One," he says.


Others are recognizing the need as well, and deployments will be improved only by increased integration between data quality tools and enterprise applications, support for Web services through SOA approaches, and processes for continuous data quality monitoring, say experts.


"If you're processing large amounts of data, you need data quality tools so you can do standardization, validation, cleansing and duplicate-checking," says Avid's Wright. Then, he says, you can start attacking the problem at the source, ensuring that new data entering the system is clean.
















CONFIDENCE IN DATA QUALITY

A PwC survey showed no significant gain in the confidence that IT executives had in their companies’ data from 2001 to 2004.


Confidence in data quality

BASE: 452 IT Executives responding to PricewaterhouseCoopers’ Global Data Management Survey 2004




Gilhooly is a freelance writer in Falmouth, Maine. You can reach her at kymg@maine.rr.com.




Print this Story Send Us Feedback E-mail this Story Digg! Digg this Story Slashdot this Story
Sidebar: Consolidation Trend
Dirty Data Blights the Bottom Line
Sidebar: Got a Match?
"We don't need al-Qaeda to blow us up. We are perfectly capable of lighting the fuse ourselves, courtesy of our..." Read more...
"Analyzing data from online and your network may be a little easier because of a new browser. Yes, a browser...." Read more...
Read more Business Intelligence posts or See all Blogs
Powerset unveils test version of Google-killer
IPhone out of stock 'companywide,' say Apple sales reps
Microsoft to limit capabilities of cheap laptops
More top stories...
FBI worried as DOD sold counterfeit networking gear
Update: Microsoft to appeal $1.3B EU fine
XP SP3 cripples some PCs with endless reboots
Mistakes such as putting down co-workers or burning bridges when you resign are surefire ways to darken your career prospects. Here's how to avoid them
Hype and promises abound in the IT world, but these six breakthroughs really will change your life, says author and former IT manager John Brandon.
Baby boomers are retiring and taking their knowledge with them. Why do so few in IT seem to care?
Computerworld editors share stories of their first PCs, including some classics and some real clunkers -- then we ask readers to share their early-PC tales.
Reviews, analyses, how-tos, visual tours, hot issues and predictions about Microsoft's new OS.
Four years from now, the IT field will be a vastly different place. Will you be ready?
All Zones
Application Performance Zone
Enterprise-Class Security Zone
Enterprise Solutions Zone
The File Data Management Zone
Grid Computing on Windows Zone
Security Management Zone
ITIL Best Practices Zone
The SAS Zone
Storage Virtualization Zone
The Data Center Management Zone

Ads by TechWords

See your link here
Computerworld Report: Storage Gets Strategic
Download this Computerworld Report, free, compliments of HP.
(Source: Computerworld) Data Storage has emerged from the back room to become a key part of regulatory compliance, disaster recovery and strategic tecnhology plans. Learn more in this new this Computerworld report, a $49.95 value, available free for a limited time, compliments of HP.
Download this executive briefing download
Long Tail Supplier Collaboration - What's In It For You?
Long Tail Supplier Collaboration - What's In It For You?
Download this webcast, free, compliments of Sterling Commerce
Go to the webcast 
The Advantages of a Hosted Messaging Security Solution
Get this report now!
(Source: Microsoft Office Live Meeting) Messaging management is becoming more difficult thanks to the growing malware threat. At the same time, messaging system administrators are under enormous pressure to push their messaging infrastructures to do more than ever, including archiving messaging content for regulatory compliance, archiving to support legal discovery and for overall litigation support, providing services to a growing body of mobile users, and ensuring continuity by making the messaging system more reliable, and managing policies for message encryption.
Download this white paper go
White Papers
Read up on the latest ideas and technologies from companies that sell hardware, software and services.
New Fujitsu High-End Itanium Windows- and Linux-Based PRIMEQUEST Servers Offer the Utmost in High Availability
New Fujitsu High-End Itanium-Based PRIMEQUEST Servers Offer Industry-Leading System Management for Linux and Windows
Symantec State of the Data Center Report 2007
View more whitepapers 
SAS Information Management Kit

SAS is the leader in business intelligence and analytical software and services. Only SAS offers leading data integration, storage, analytics and business intelligence applications within a comprehensive enterprise intelligence platform. SAS gives 97 of the top 100 companies in the 2007 Fortune 500 THE POWER TO KNOW®.

Webcast: The Information Management Roadmap
Imagine high-quality data, cleansed, analyzed and delivered throughout your organization. Join Computerworld, IT visionary Thornton May and a panel of experts to learn how SAS® can help you make it happen.

View this webcast 
Research Report: Information Management Initiatives at Midsize and Large Organizations
See the top-line results of this Computerworld sponsored survey to see how IT and business leaders are handling information management implementation.

Download this report 
White Paper: Information Management: Better Information for Winning Decisions.
This white paper explains how the SAS Information Evolution Model aids companies in assessing how they use this information to make strategic decisions and drive business.

Download this white paper