Data Stewards Seek Conformity

They have a variety of different titles, but these analysts work with the IT and business groups to improve data quality and standardization.

A customer is a customer is a customer, right? Actually, it's not that simple. Just ask Emerson Process Management, an Emerson Electric Co. unit in Austin that supplies process automation products. Four years ago, the company attempted to build a data warehouse to store customer information from over 85 countries. The effort failed in large part because the structure of the warehouse couldn't accommodate the many variations on customers' names.

For instance, different users in different parts of the world might identify Exxon as Exxon, Mobil, Esso or ExxonMobil, to name a few variations. The warehouse would see them as separate customers, and that would lead to inaccurate results when business users performed queries.

That's when the company hired Nancy Rybeck as data administrator. Rybeck is now leading a renewed data warehouse project that ensures not only the standardization of customer names but also the quality and accuracy of customer data, including postal addresses, shipping addresses and province codes.

To accomplish this, Emerson has done something unusual: It has started to build a department with six to 10 full-time "data stewards" dedicated to establishing and maintaining the quality of data entered into the operational systems that feed the data warehouse.

The practice of having formal data stewards is uncommon. Most companies recognize the importance of data quality, but many treat it as a "find-and-fix" effort, to be conducted at the end of a project by someone in IT. Others casually assign the job to the business users who deal with the data head-on. Still others may throw resources at improving data only when a major problem occurs.

"It's usually a seesaw effect," says Chris Enger, formerly manager of information management at Philip Morris USA Inc. "When something goes wrong, they put someone in charge of data quality, and when things get better, they pull those resources away."

Creating a data quality team requires gathering people with an unusual mix of business, technology and diplomatic skills. It's even difficult to agree on a job title. In Rybeck's department, they're called "data analysts," but titles at other companies include "data quality control supervisor," "data coordinator" or "data quality manager."

"When you say you want a data analyst, they'll come back with a DBA [database administrator]. But it's not the same at all," Rybeck says. "It's not the data structure, it's the content."

At Emerson, data analysts in each business unit review data and correct errors before it's put into the operational systems. They also research customer relationships, locations and corporate hierarchies; train overseas workers to fix data in their native languages; and serve as the main contact with the data administrator and database architect for new requirements and bug fixes.

As the leader of the group, Rybeck plays a role that includes establishing and communicating data standards, ensuring data integrity is maintained during database conversions and doing the logical design for the data warehouse tables. She'll also oversee implementation of the Group 1 Software Inc. data cleansing system and work with The Dun & Bradstreet Corp., whose database is used for company-name standardization and hierarchies.

The analysts have their work cut out for them. Bringing together customer records from the 75 business units yielded a 75% duplication rate, misspellings and fields with incorrect or missing data.

"Most of the divisions would have sworn they had great processes and standards and place," Rybeck says. "But when you show them they entered the customer name 17 different ways, or someone had entered, 'Loading dock open 8:00-4:00' into the address field, they realize it's not as clean as they thought."


Although the data steward may report to IT—as is the case at Emerson and at pharmaceuticals company Sanofi-Synthelabo Inc.—it's not a job for someone steeped in technical knowledge. Yet it's not right for a business person who's a technophobe, either.

What you need is someone who's familiar with both disciplines, like Seth Cohen. Cohen is the first data quality control supervisor at Sanofi in New York. He was hired a year ago to help design automated processes to ensure the data quality of the customer knowledge base that Sanofi was beginning to build.

Cohen has enough technical skills to be able to spec out a data-cleansing system and then work with a developer to make sure that the system is written correctly.

But having worked in the pharmaceuticals field for three and a half years, he also knows the industry's specific business rules and understands the most important data concerns that must be addressed during the requirements-gathering stage.

Data stewards should have business knowledge because they need to make frequent judgment calls, Cohen says. With Sanofi's data warehouse, for instance, if the system expects to get numbers in a field but gets a string of letters instead, Cohen must decide what's wrong and how to correct it.

Mary Pickett is another data steward who has a mix of skills. When she joined Winston & Strawn LLP, a law firm in Chicago, she considered herself a database specialist. Today, however, her title is "marketing applications specialist," and one of her primary duties is to ensure the quality of Winston & Strawn's contact database.

"Especially in this economy with people moving around, it's a highly charged, dynamic database that keeps changing," Pickett says. "If it sits for a month, it's dirty again."

Pickett prefers to train business users from within the company to keep the data clean. Likely candidates include paralegals or secretaries who manage contact lists for their practice groups. Still, she says, it takes a solid year for data clerks like them to gain the necessary experience to move up to data coordinator.

The reason: They need to learn not only how to sort through duplicate company names, make sure contact names are associated with companies and use the database's cleansing tools, but also how to prioritize which clients are the most important to work on. "We want to keep our top clients as clean as possible," Pickett says.

Perfection Unattainable

Indeed, judgment is a big part of the data steward's job—including the ability to determine where you don't need 100% perfection.

At OneSource Information Services Inc., a provider of business information products in Concord, Mass., orientation sessions include a speech on the inevitable dirtiness of data. But at the same time, says Beth Jacaruso, director of content management at OneSource, the company "lives and dies by data quality." So where do you draw the line? That's where data stewards come in: deciding what's "clean enough."

Cohen says that task is one of the biggest challenges of the job. "100% accuracy is just not achievable," he says. "Some things you're just going to have to let go or you'd have a data warehouse with [only] 15 to 20 records."

A good example is when Sanofi purchases data on doctors that includes their birth dates, Cohen says. If a birth date is given as Feb. 31 or the number of the month is listed as 13 but the rest of the data is good, do you throw out all of the data or just figure the birth date isn't all that important?

It comes down to knowing how much it costs to fix the data vs. the payback. "You can pay millions of dollars a year to get it perfect, but if the returns are in the hundreds of thousands, is it worth it?" asks Chuck Kelley, senior advisory consultant at Navigator Systems Inc., a corporate performance management consultancy in Addison, Texas.

Good Diplomats

Data stewards also need to be politically astute, diplomatic and good at conflict resolution—in part because the environment isn't always friendly. When Cohen joined Sanofi, some questioned why he was there. In particular, IT didn't see why he was "causing them so many headaches and adding several extra steps to the process," he says.

There are many political traps, as well. Take the issue of defining "customer address." If data comes from a variety of sources, you're likely to get different types of coding schemes, some of which overlap. "Everyone thinks theirs is the best approach, and you need someone to facilitate," says Robert Seiner, president and principal of KIK Consulting & Educational Services in Pittsburgh.

People may also argue about how data should be produced, he says. Should field representatives enter it from their laptops? Or should it first be independently checked for quality? Should it be uploaded hourly or weekly? If you have to deal with issues like that and "you're argumentative and confrontational, that would indicate you're not an appropriate steward," Seiner says.

Most of all, data stewards need to understand that data quality is a journey, not a destination. "It's not a one-shot deal—it's ongoing," Rybeck says. "You can't quit after the first task." Brandel is a freelance writer in Grand Rapids, Mich. Contact her at


Required Tech Skills

Basic understanding of data modeling
Basic understanding of DBMS
Strong understanding of data warehouses
Technical writing

Source: Claudia Imhoff, president and founder of Intelligent Solutions Inc., Boulder, Colo.

Copyright © 2004 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon