Dirty Data Blights the Bottom Line

Data quality isn't a glamorous topic, but Companies ignore it -- especially for internal systems- at their financial peril.

When Nancy Rybeck was hired by Emerson Process Management six years ago, she was charged with salvaging a data warehouse that had been built to help the company better analyze customer activity. But after a thorough review, she opted to scrap it and start over. The warehouse, it seemed, was loaded with redundant and inaccurate data.

"The biggest reason [the earlier effort] had failed was data quality," says Rybeck, data warehouse architect at Austin-based Emerson Process Management, a global supplier of measurement, analytical and monitoring instrumentation and services. One major contributor to the failure was an assumption made by the group that launched the initial Microsoft Access-based effort: that sales entities all over the world would enter customer names and addresses in the same manner, regardless of whether they operated in the Asia-Pacific region, Europe or other areas in which Emerson does business. Cultural differences, combined with complications caused by Emerson's continuing growth through acquisition, resulted in numerous ways of entering quote, billing, shipping and other key data.

Dirty Data Blights the Bottom Line
Image Credit: Andrew Skwish

Emerson's problems with inaccurate data are typical across all industries. Through 2007, at least 25% of critical data within Fortune 1,000 companies will continue to be inaccurate, according to Gartner Inc. And only 34% of executives responding to a 2004 PricewaterhouseCoopers survey said they're very confident in the quality of their corporate data.

Although many businesses tend to think that data quality primarily affects customer-facing initiatives, the impact can be more profound on internal operations. "CRM initiatives fail, and companies get into trouble with the security and privacy of customer data," says Gartner analyst Ted Friedman. "But the big money being lost [because of poor data quality] is in internal operations."

Inaccurate financial reporting, uncollected receivables, overpayments, poor product specifications, excess inventory—the problems caused by inaccurate data are endless, and they all affect the bottom line.

Meanwhile, mounting regulatory compliance requirements dictate increased data vigilance. "You can have all the controls in place, but if your data's not accurate, your CFO will be signing off on inaccurate information," says Robert Lerner, an analyst at Current Analysis Inc.

Data quality initiatives have long languished in the shadow of sexier projects. But thanks to failed CRM and ERP efforts, compliance violations, costly supply chain inefficiencies and more, that's starting to change. Investments in data quality suites are growing at a rate between 12% and 15% annually, according to Gartner, and the market is starting to consolidate as it matures.

Protect Your Source

Tools that address data quality fall into a variety of categories, including data profiling software, which sifts data fields for duplication, missing information and other errors; data cleansing and matching tools, which parse data into discrete elements, clean it, standardize it in formats, and match and merge records; data enhancement tools, which enrich data by incorporating, for instance, third-party elements; and data monitoring tools, which ensure that data maintains a preset level of quality.

Some IT groups still rely on extract, transform and load functions to ready data from various applications for staging in warehouses, but experts say ETL's effectiveness is only as good as the data being transformed.

"ETL isn't the same thing as data quality; [the process] may have nothing to do with data cleanup," says Chad Wright, applications manager for business intelligence and CRM at Tewksbury, Mass.-based Avid Technology Inc. The provider of digital media creation products purchased tools from Firstlogic Inc. in 2001 as part of an effort to clean and match data between its new SAP CRM system and its legacy Onyx CRM system. (In September, Pitney Bowes Inc. announced its intention to acquire Firstlogic.)

Avid continues to use Firstlogic tools to validate customer master data from companies it acquires against its SAP masters to prevent duplication. It's also performing some data quality measures in real time: The IT group has developed a Web service that takes advantage of Firstlogic's IQ8 service-oriented architecture (SOA) to automatically capture shipping information and validate it against country-specific postal codes, accepting or correcting it at the point of contact. Avid is also running the vendor's Global Data Quality Connector for SAP, which allows real-time checks in the SAP environment during order processing.

"We saw benefit from doing real-time data quality functions, cleaning up the thousands of marketing leads that come into our systems every day," says Wright.

Emerson has adopted data quality tools from Landham, Md.-based Group 1 Software Inc., a Pitney Bowes subsidiary, to help profile, cleanse and merge records for its data warehouse. Given Emerson's global scope and acquisition strategy, cleaning data manually wasn't an option, says Rybeck.

Emerson's data warehouse is fed by numerous source systems from around the world. Contact information for quoting, billing and shipping is linked to associated transactional records. Duplicate records are then eliminated, and the data is merged using Group 1 tools, custom coding and manual review processes. Ultimately, says Rybeck, Emerson wants full use of its contact data to better anticipate customer needs and improve its service and marketing.

"The plan is to have this feedback loop be complete. In the past, we may have used the marketing information to get some business, but we've never followed through to see what the profitability was in a marketing campaign," says Rybeck.

Keeping It Clean

Since adopting data quality tools from Billerica, Mass.-based Trillium Software, printer manufacturer Oki Printing Solutions has been able to improve its marketing campaigns. It has also significantly reduced fees and fines in its distribution chain associated with bad contact information, says senior systems analyst Maggie Dominguez. Mount Laurel, N.J.-based Oki started using Trillium in 1999 to clean data it was moving from legacy systems to SAP and now uses it for handling consumer and end-user contact data.

Dominguez and her team started building a data warehouse less than a year ago to improve the company's analysis capabilities for functions such as sales projections. "We wouldn't have survived without the data quality tools," she says. "We would have ended up with huge quantities of data that would have been very hard to mine."

Although businesses would like to address their data quality problems once and be done, maintaining accurate data is an intensive, ongoing effort. A master customer file may be completely accurate on Friday evening but house numerous inaccuracies by Monday morning, without any interference. That's because data decays by itself: People are born and die, and they change names and addresses; companies go out of business or get snapped up. Further, there are many points of entry to enterprise data sources, and data is continually repurposed.

Key to maintaining data cleanliness is controlling who touches it, says Jeffrey Monica, manager for data quality at StorageTek, a subsidiary of Sun Microsystems Inc. StorageTek has been using tools from DataFlux Corp., a SAS Institute Inc. subsidiary, to cleanse data from more than 60 source systems worldwide for a customer data warehouse. "We want to give people the flexibility to use [the warehouse and associated marts], but we'll have control over the quality of the data so we can say we have a single version of the truth," Monica says.

StorageTek uses Informatica Corp.'s ETL tool to pull data from specific fields for data warehouse loading. It uses DataFlux tools to identify the most accurate record and to cleanse and standardize data. Though Monica says StorageTek still has significant data duplication, the company has thus far reduced 1 million records in its warehouse to around 200,000.

StorageTek began the warehouse effort three years ago but didn't buy data quality tools until a year ago, says Monica. "The good news is we recognized we needed them, but the bad news is we didn't do it on Day One," he says.

Others are recognizing the need as well, and deployments will be improved only by increased integration between data quality tools and enterprise applications, support for Web services through SOA approaches, and processes for continuous data quality monitoring, say experts.

"If you're processing large amounts of data, you need data quality tools so you can do standardization, validation, cleansing and duplicate-checking," says Avid's Wright. Then, he says, you can start attacking the problem at the source, ensuring that new data entering the system is clean.



A PwC survey showed no significant gain in the confidence that IT executives had in their companies’ data from 2001 to 2004.

Confidence in data quality

BASE: 452 IT Executives responding to PricewaterhouseCoopers’ Global Data Management Survey 2004

Gilhooly is a freelance writer in Falmouth, Maine. You can reach her at kymg@maine.rr.com.

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon