Redundant data, wrong data, missing data, miscoded data. Every company has some of each, probably residing in IT nooks that don't communicate much. It's not a new problem, but these days the jumble becomes very apparent during high-profile projects, such as installing enterprise resource planning (ERP) or supply chain management software.
Companies often focus on the business process and not on the form and congruity of the resulting data, says John Hagerty, an analyst at AMR Research Inc. in Boston. When a company does that, a frustrated IT department has to step back from the glamour work to cleanse, reconcile and integrate data from various silos around the company.
For example, different sales, inventory or manufacturing systems at a clothing retailer might track the same item by different names. A central database - if there is one - might include "extra large," "XL" and "TG" (for the French term tres grande). But they all refer to the same thing.
And then there's the attic problem familiar to most homeowners: Toss in enough boxes of seasonal clothes, holiday trim, family history documents and other important items, and soon there's a stored mess that's too big to manage. That can happen at companies, too. Multiple operating units, manufacturing plants and other facilities may all run different vendors' applications to do sales, human resources and other tasks. That mix of disparate data makes for a mass of unsorted and unreconciled information.
When it's time for integration, Hagerty says, the question becomes, "Do I throw out the five or six applications that capture my data and put in one new application? Or do I take the data and scrub it, reconcile it, organize it?"
Either way, he says, "it's a humongous effort. No question about that."
Getting It All Under One Roof
Shell Exploration and Production is in the throes of such a project. Early last year, the fuel company wanted to combine data from its SAP AG financial applications with data from its mishmash of volumetric systems, which process information on how much gas and oil the company finds and collects.
"Every different system has it's own internal sets of codes," explains Steve Mutch, data warehouse team leader at Shell Exploration in Aberdeen, Scotland. "Going back and cleansing the data in those host systems wasn't an option." It would have taken too much time and been too expensive, he says. Instead, Mutch found a tool from Kalido Ltd. in London that maps the data from various systems and combines it into one warehouse.
After nearly seven months of mapping work, 27 data sources now come together in a 450GB warehouse, Mutch says.
Corporate politics weren't too bad because no single business unit lost control of its data, he says. And now they all contribute to a greater understanding of the information for the company as a whole.
"Once the concept was proved, we had pressure from the top [executives] to integrate other [applications as well]," he says. "They could see themselves what information they could now get and how powerful it is."
Even if a company decides to replace different applications with one new one as a way to address data chaos, it probably won't be easy.
Many of the top customer relationship management (CRM) and ERP vendors, for example, offer suites comprised of their own applications plus others they have acquired.
The products in the suites, therefore, weren't built together and may not pass data back and forth smoothly, says Jon Dell'Antonia, vice president of MIS at OshKosh B'Gosh Inc., a clothing maker in Oshkosh, Wis. "You immediately find out it's not seamless."
ERP vendors are trying to address the issue by providing data models and data warehouses with their suites. But Dell'Antonia has avoided ERP suites. His approach to data integration is to have a homegrown IBM DB2 warehouse that unifies data from different applications.
For example, in one of OshKosh B'Gosh's transaction-processing applications, the term "sales" is used. But on the user interface, it's called "customer sales."
A tool from DataMirror Corp. in Toronto uploads sales data to DB2 once a day. And the data warehouse recognizes the differently named items as the same because OshKosh B'Gosh programmers created tags that reconcile incoming data elements.
The data silo issue caused serious customer service problems for Southern Illinois Healthcare, a Carbondale, Ill.-based network of six rural hospitals.
Each hospital ran its own database, which meant that when a patient of one facility sought treatment at another, he had to reregister. That was bothersome, and the problem was compounded by the fact that these customers were usually sick and therefore not in the best of moods.
"They got tired of giving the same information again and again," says Frank Sears, CIO at Southern Illinois Healthcare.
Replacing the different applications at each hospital was too expensive, so the company opted for what is known in health care as an enterprise master patient index. In October 1999, Sears hired Madison Information Technologies Inc. in Chicago to help build the index. Madison also provided tools to allow data on patients from one facility to be propagated to other facilities in a common format.
By the time the project was done in April 2000, Southern Illinois Healthcare had extended the system to four affiliated clinics as well.
But it's not over. There's also an opportunity to eliminate many medication errors by integrating the right data, Sears says. "Making sure you have the right drug, procedure and patient all boils down to someone making a choice [based on information]," he says. "Some of it can't be automated, but some of it can," with bar-coded wrist bands and medication bags from a dispensing system, he explains.
"These are data integration issues," says Sears.
Cost-justifying data integration projects isn't all that hard, users say.
"Once you start putting all [your data] in one spot, all the past sins are clearly visible," says Cathy Witt, the CIO at computer retailer CompUSA Inc., which is based in Dallas.
The logic to use when broaching the subject with business unit managers is simple, she says: "You tell them, 'Data makes us what we are. If I can give you good clean data, as opposed to your only being able to use a portion of it, you will make better decisions.' " That's how Witt convinced CompUSA's retail stores and, later, the product warranty unit, to fund a data cleanup last year.
To help CompUSA sell more warranties on its consumer computer products, the IT department now culls sales data from its Siebel Systems Inc. CRM applications. The data is then sent through a cleansing tool from Trillium Software in Billerica, Mass. The Trillium tool searches for duplicate and incomplete information. It also helps fill in missing information by, for instance, matching ZIP codes against its own ZIP code database.
The data is then put in a warehouse to be analyzed and mined by sales agents.
Although third-party data cleansers could do the same work, Witt didn't want to farm out the job.
"We have the skills," she says, "and we care about the quality of our data more than anyone else does."
Nash is a freelance writer in Yorktown Heights, N.Y.
Taming Data Chaos
Stories in this report:
- Taming Data Chaos
- The Story So Far
- Merging Data Silos
- Beware of Data Overload from External Data
- Learn to Manage Data, Not Crises
- Data's Tower of Babel
- Extracting Dollars From Data
- Why ROI is so Elusive
- Collections of Data: Bases, Marts, Warehouses
- The Power of Location
- Seeding for Data Growth
- The Search is On
- The Data Designers
- Demise of the Disk Era
- Dawn of a New Database
- Keeping CFOs Happy
- Case Studies in Data Management
- Hot Issues: Scalability and Data Integration