Piecing Together the Data Picture

Data quality translates into companies having the right information at the right time to make decisions.

Poor data quality can confuse your customers, undermine your applications or even put you out of business—and there's everything in the world you can do about it. More than simple data-cleansing, which involves correcting a misspelled name or changing "Avenue" to "Street," a data quality initiative addresses more complex and subtle problems.

For example, one New York bank that had a 3% to 5% bad-debt ratio on its credit card operation acquired another bank, says Aaron Zornes, a San Francisco-based analyst at Meta Group Inc. "It turns out that the acquired bank had a 15% bad-debt ratio. The New York bank took over, and the bad debt nearly put them out of business," he says.

If the acquiring bank had had a data quality initiative to run large database-comparison jobs off-line, the problem could have been averted, says Zornes. Bank managers could have predicted the loan default rate by comparing the outstanding debt, incomes and even partial ZIP codes of the acquired bank's credit card customers against a historical database of similar customer profiles.

"They would have been able to tell that this company wasn't a good buy," Zornes says. "Enterprises cannot afford to wait on data quality efforts."

Data quality initiatives are critical to enterprise applications such as CRM and ERP systems, Zornes notes. And according to The Data Warehousing Institute in Seattle, data quality problems cost U.S. businesses more than $600 billion per year.

"The basis of any CRM system is the integrity of the data," says Steve Deeb, vice president for CRM at Monster Worldwide Inc. in Maynard, Mass. "Any and all processes are driven by that data."

In addition to business needs, there are now regulatory pressures to maintain better data, Zornes says. "If someone has bought a large amount of ammonia-based fertilizer, then rents a car," the U.S. Department of Homeland Security wants to know about it, he says. "And this isn't information you can wait months or even a week to find out."

The tools to to improve data quality exist, says Zornes, but although "businesses give lip service to the need for data quality, too often they don't do anything about it."

James Eardley, a managing director of CRM at FleetBoston Financial Corp., agrees. "Data quality gets short shrift too often. It's not important until you need it," he says.

Although in dissimilar industries, FleetBoston and Monster both use CRM software from Siebel Systems Inc. in San Mateo, Calif., and faced similar data quality problems. Duplicate records in customer and contact databases meant one department didn't know what another was doing.

"What we were missing was a total picture of the customer relationship. We have multiple business sales forces following a single customer. It's hard enough to get one business unit's data clean. We now have 24," Eardley says.

"There's no consistency with how users enter customer and contact records," he continues. "Some people use upper- and lowercase; others use all uppercase." Today FleetBoston's system standardizes the data elements and does ZIP code lookups.

The company opted for data quality software from FirstLogic Inc. in La Crosse, Wis. Those tools, coupled with the Siebel software, "seemed to do exactly what we needed," Eardley says.

To prevent duplicate entries, when a user enters a record, the FirstLogic system generates a token, which it compares to others to see if the database has similar tokens. If it finds any, it shows them to the user to determine whether the record is a duplicate.

"We had to work a little bit to get the tokens to our liking, and then it worked fine," Eardley says. "We also run batch jobs monthly to identify and fix any duplicates." Any records that the system can't resolve go to the business side for review.

Monster Problem

Similar data inconsistencies undermined confidence in Monster's system, says Deeb. Duplicates and unidentified accounts in the Siebel system made it difficult to know which database to use for ordering or invoicing, he says. And the sales staff wasn't getting the support it needed.

Initially, Deeb says, "we didn't see a product that mapped directly into what we were doing." But after building its own address-matching application, the company found that it needed a more strategic tool and more sophisticated analysis than its in-house application could offer.

About a year and a half ago, Monster took another look at the field and chose the Trillium Siebel connector from Trillium Software, a division of Harte-Hanks Inc. in Billerica, Mass.

"When we were looking at the ROI, the ease with which the Trillium product could be integrated into our systems was attractive," Deeb says. "We leveraged the strength of the Trillium core product—such as the way name and address databases from around the world can be plugged in—and integrated it into our processes in a way that made sense to the way we do business."

Now, when a record is entered, the system evaluates in real time whether it's new or a modification of an existing record. The company also runs data quality checks in batches to ensure that duplicates aren't introduced when it incorporates a new mailing list into its existing database. They're also performed at regular intervals to minimize data degradation. In addition to the IT resources dedicated to maintaining data quality, business staffers are also assigned to monitor the system and resolve anomalies.

It's the essence of analytical CRM, Deeb says. "Real-time analysis to determine the right offer to the right customer at the right time in a predictable manner is driven by the quality of customer data supporting that analysis," he says.

But most companies believe that their data is cleaner and more accurate than it is, says Wayne Eckerson, The Data Warehousing Institute's education and research director. He cites as one example an insurance company that each month gets 2 million claims, each with 377 data elements. At an error rate of 0.1% for all claims data, that's more than 754,000 errors monthly, which amounts to 9.04 million errors annually. If 10% of data elements are critical to its business decisions, the company each year must correct more than 1 million errors that could damage its ability to conduct business. Estimating the risk cost at $10 per error, poor data quality costs the company $10 million annually in erroneous payouts.

"It's bewildering," says Eckerson, "but almost half of all companies have no plan for managing data quality." Responsibility for data quality often rests with IT staffers, who make their decisions based on the tools available.

Data Quality Means Business

"First and foremost, data quality is a business issue," says Ted Friedman, an analyst at Gartner Inc. in Stamford, Conn. "But the solution is the proverbial three-legged stool: people, process and technology."

The first step in a data quality initiative is to analyze what the data is and how it's used, Friedman says.

GMAC Mortgage Corp. in Horsham, Pa., followed this measured course in its data quality initiative. When interest rates went into free-fall a year and a half ago, the first thing the company's CEO wanted employees to do "was cope with a 300% to 400% increase in daily business of people refinancing mortgages," says David Adams, GMAC's enterprise data access manager.

Tuning the Oracle database that supported application processing improved performance, he says, "but it also opened our eyes to the need to go further and address the quality of the data itself." And with GMAC beginning a major overhaul of its data warehouse—"actually, it was more a large tank of data than a data warehouse," says Adams—the timing was right to launch a data quality initiative.

"To compete on the other side of the refinancing boom, we were going to have to have better, cleaner data to get the accurate analyses that the CEO wanted and that we needed to make the most of our operation," he says.

Adams brought in a data quality consultant to explain to the executive council what the project would entail. Adams and his team researched the data quality tools, ran two pilots and then selected software from Ascential Software Corp. in Westboro, Mass. The Ascential product was more expensive and took more work to get going than some less sophisticated tools, he says. But Adams was sold on the software's heuristic logic, which let it adapt to GMAC's operation.

"The ETL [extract, transform and load] technology is pretty mature, and it works well," says Adams. "But it's the data quality and metadata stuff that's going to give you the great advances."

Physically merging databases would have required that every division agree on a single definition for each data element, which was "probably impossible," Adams says.

Instead, metadata resides in Ascential DataStage and links divisional databases at the logical level, with "pointers" indicating the source of the data. Each division's database remains inviolate.

Each division can decide what data can be shared and with whom, which is important for adhering to government regulations. Other tools couldn't deliver that granularity of control, says Adams.

The team installed the software in January and, working with the data warehousing team, went live in May with a relatively small application for new credit policy reporting. The first large data mart, to support all reporting for GMAC's wholesale operations, will go live Aug. 15.

"Information is a critical asset," says Meta's Zornes. "We need to change the way we think about it. It may sound like science fiction now, but in the future, companies will certify information the way we certify works of art and financial instruments, i.e., by assigning that information asset's value and origination."

Lais is a Computerworld contributing writer in Takoma Park, Md.


1 Synchronized master. Use middleware to synchronize data in its native store and create a logical master in real time. Best for companies with low data velocity.

2 Application-specific master. Pick one operational application, such as CRM, to be the master. Best for companies with data primarily in one application.

3 Customer master overlay. Use a third-party, application-agnostic overlay, a common choice of big banks and insurance companies. Best for vertical industries, such as banking, insurance and travel.

4 Data-warehouse-based master. Create a data-store-like structure to straddle operational and analytical environments. The store holds recent, transaction-level data; the warehouse holds summaries and data analyses. Best for companies with low operational data latency needs.

Source: Meta Group Inc., Stamford, Conn.


Denial. IT managers assume that old data will serve new uses without being re-engineered.

Deception. They assume that their new ERP or CRM software will solve the problem.

Deflection. They shift responsibility for data quality to someone else—users, IT, those doing data entry or the systems integrator implementing the new system.

Deferral. IT managers think they can put off fixing data quality until after the new system is implemented.

Source: Stephen Brown, Ascential Software Corp., Westboro, Mass.

Copyright © 2003 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon