Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Computerworld 2007Subscribe to Computerworld
40 years of the most authoritative source of news and information for IT leaders.

Bridging Data Islands

Integrating data from disparate sources provides companies with powerful management tools, but the process can be difficult, costly and error-prone.
 

Sign up to receive Security Resource Alerts

October 14, 2002 (Computerworld) -- After just two months, a new software tool enabled Aventis Pharmaceuticals Inc. to discover a promising candidate for a new drug to treat asthma, arthritis or even perhaps cancer; it's a chemical compound that might well have been overlooked using traditional IT tools.

Aventis is using DiscoveryLink, a feature of IBM's DB2 database management system that can propel a single SQL query out to multiple, heterogeneous data sources and bring information back to the user in one coherent view.


"Using this integrated framework, scientists were able to pull data from many different sources around the world, visualize it in a new way that they could never do before," says Peter Loupos, vice president for drug innovation and approval information systems at the Bridgewater, N.J.-based company.


IBM calls the Aventis approach to information integration "database federation." To get at federated data, DB2 uses IBM "wrapper" software called DataJoiner and Relational Connect. There's one wrapper for each type of data source—whether it's from Oracle Corp. or Sybase Inc. systems, Microsoft Corp. SQL Server or flat files—with each one mapping the source data model to the DB2 data model. A single Aventis query may be sent against heterogeneous relational databases, unstructured documents and in-house expertise culled from e-mail and other sources.


Sending a SQL query against remote, heterogeneous databases is just one of several ways to integrate data. Others include the following:



  • Custom, hard-wired interfaces that pass information from one application to another. These can be made to work exactly as users demand, but they can be costly to set up and maintain.
  • Replication, in which a commercial product regularly or continuously copies databases or parts of databases from one place to another. Replication is simple but limited in its ability to do anything to data beyond copying it.
  • Extract, Transform and Load (ETL), a process often used to create data warehouses and data marts. ETL software moves data from one place to another, applying rules or table lookups to combine or transform data in some way. ETL is powerful but can be very complex.
  • Web services. Enabled by Internet protocols including the XML standard for exchanging data between disparate systems, Web services allow SQL-based relational data to be accessed as XML, or native XML to be accessed through SQL. Web services are ideal when applications are loosely coupled and difficult to integrate in other ways.

Regardless of the approach taken, data integration can be difficult, expensive and error-prone. In particular, great care must be taken to build interfaces between applications and databases that ensure accuracy and timeliness of information and that answer the needs of disparate communities of end users. Below, we look at how two organizations tackled their data integration problems, and you can read two more case studies online .


Case 1: Standardization Is Key at Affina


While it supports all of the various methods of integrating data, Oracle favors loading data into centralized databases and data warehouses. And that's the approach taken by Oracle customer Affina, a contact-center service provider in Peoria, Ill.


Affina, which has about 3TB of data in Oracle databases, runs call centers for about 100 clients. That requires taking data from telephone calls, e-mail, Web chat and various back-end processing systems and feeding it into an "operational data store," a data warehouse and several data marts, says Tom Asp, vice president for IT. Affina also pushes information—such as call analysis—back out to customers. For efficiency and security, data coming in and going out is staged in customer-specific tables that can be accessed by customers.


Affina uses Oracle Warehouse Builder, Oracle's SQL Loader utility, custom PL/SQL procedures and .Net assemblies in ETL processes to pull in and reformat customer data from Lotus Notes e-mail databases, call records from telephone switches, flat files from interactive voice response units and client data from a system that emulates mainframe CICS. "Any way we could get data, we get it," Asp says.


Affina also uses Oracle8i replication utilities to move data in real time from a big call center in El Paso, Texas, to its data warehouse in Peoria. The Peoria copy is used for reporting, but the production database remains in El Paso for performance reasons.


Automation and standardization are the keys to making such a complex operation run smoothly, Asp says. File transfers, schedulers and loaders are automated, and strict standards are adhered to so new processes can be added easily and without fear of bugs. He says Affina is increasingly moving to the exchange of information in XML format because "it provides a common, flexible file format so that ETL procedures can be created more efficiently."


Case 2: Database Dilemma at Terracon


Terracon Inc., an engineering consulting firm in Lenexa, Kan., uses an Oracle enterprise resource planning (ERP) system and an Oracle data warehouse. A two- to three-hour weekly process uses scripts that Terracon wrote in PL/SQL to move data from the operational ERP data to the warehouse, where it is used for reporting. Not all information—customer and employee names, for example—is replicated to the warehouse. Terracon was unable to use Oracle's replication product, Streams, because the two repositories used different versions of Oracle, according to CIO Frank Milano.


Milano says he'd like to convert the data warehouse to Microsoft's SQL Server database. He says that with his existing Microsoft enterprise agreement he could do that for an additional $2,000 to $3,000 and save $30,000 to $50,000 in Oracle license fees. The ERP system would remain Oracle, however, because it's the best fit for Terracon's project accounting and financial management needs, he says.


And Milano says total cost of ownership with SQL Server would be lower than with Oracle. "Trying to maintain an Oracle database is brutal," he says. "It's complex, not for the faint of heart." But, he concedes, Oracle is better suited for large, complex, dynamic database applications.


But Milano says he worries that interfacing the Oracle and SQL Server databases may be tricky. In addition to the weekly transfer of production data from the ERP system to the warehouse, users must be able to initiate queries against the warehouse and have them go to the ERP database to pick up customer and employee names, and queries against the ERP system must be able to go to the warehouse for information that has already been calculated and summarized. "That's a great concern for us," he says. "I don't have a solution for it."


But Microsoft claims it does. According to Tom Rizzo, group product manager for SQL Server, Terracon could move data from the Oracle ERP system to the SQL Server data warehouse—doing any needed editing or massaging en route—using the Data Transformation Services tools that ship with SQL Server. And to access both databases in a single query, Terracon could use the Distributed Query engine in SQL Server, Rizzo says.















IBM's Integration Approach

In IBM's "federated database" approach to data integration, a single DB2 query can go against heterogeneous remote relational databases, Web sites, unstructured documents and flat files. DB2 uses IBM software called DataJoiner and Relational Connect "wrappers." There's one wrapper for each data type, and the wrappers map the source data models to the DB2 data model.









Print this Story Send Us Feedback E-mail this Story Digg! Digg this Story Slashdot this Story
Data Integration Products
A 'techno-political solution' for Ventura County
Information Discovery at Aventis
Bridging Data Islands
"Enterprise search continues to lag behind commerical search because companies lack a "findability" strategy, says one researcher...." Read more...
"It's IT Blogwatch: in which we all wonder how much we get paid and Glassdoor.com helps us out. Not to..." Read more...
Read more Business Intelligence posts or See all Blogs
Microsoft promises four patches next week
Google gives away home-cooked Web application security scanner
Expect iPhone, Fourth of July scams, security firm says
More top stories...
Microsoft trumpets security additions in upcoming IE8
Apple cuts price of high-end SSD MacBook Air by $500
Ultrathin showdown: Apple MacBook Air vs. Lenovo ThinkPad X300 vs. Toshiba Portege R500
All it takes is a couple hours and about $125 to breathe new life into an old laptop. Here's how.
Is Microsoft's Golden Age over? What are Gates' most memorable quotes? Find out in Computerworld's complete coverage of the end of the Bill Gates era at Microsoft.
There are some things your CIO definitely doesn't want to hear. Also don't miss the flipside, Five things you should always tell your boss.
With its latest version, Mozilla's browser continues to raise the bar for what Web browsers should be.
Reviews, analyses, how-tos, visual tours, hot issues and predictions about Microsoft's new OS.
Four years from now, the IT field will be a vastly different place. Will you be ready?
All Zones
Application Performance Zone
Business Continuity Zone
Data Center Management Zone
Enterprise-Class Security Zone
The File Data Management Zone
Grid Computing on Windows Zone
Security Management Zone
ITIL Best Practices Zone
The SAS Zone
Storage Virtualization Zone
Business Intelligence and Analytics Zone

Ads by TechWords

See your link here
Speeding the time to intelligence
Get this Computerworld report free for a limited time, compliments of SAS.
Time To Intelligence -- a concept defining how long it takes to get accurate and timely information into the hands of workers who need it most. Do it slower than your competitors and your company is toast. Do it faster, you scorch them. Business Intelligence is the key to optimizing Time To Intelligence, and success there is a combination of people, policies, and technology.
Download this executive briefing download
Why SaaS is Vital to Email and Web Security
Why SaaS is Vital to Email and Web Security
Download this webcast, free, compilments of Webroot Software
Go to the webcast 
Rapid application development, rapid results
Download this special report now!
(Source: Intersystems) All too many businesses suffer from IT infrastructures that are a hodge-podge of disconnected databases and applications. What's needed is the ability rapidly develop connected applications under a unified service-oriented architecture. InterSystems Ensemble integration environment and Cache database are effective tools in answering this need, delivering a rapid ROI.
Download this white paper go
White Papers
Read up on the latest ideas and technologies from companies that sell hardware, software and services.
Deploying Virtualized NetWare on Linux Whitepaper
Toward More Flexible, Next-Generation Collaboration Solutions
Driving Business Success Through Workgroup Choice and Flexibility
View more whitepapers 
SAS Information Management Kit

SAS is the leader in business intelligence and analytical software and services. Only SAS offers leading data integration, storage, analytics and business intelligence applications within a comprehensive enterprise intelligence platform. SAS gives 97 of the top 100 companies in the 2007 Fortune 500 THE POWER TO KNOW®.

Webcast: The Information Management Roadmap
Imagine high-quality data, cleansed, analyzed and delivered throughout your organization. Join Computerworld, IT visionary Thornton May and a panel of experts to learn how SAS® can help you make it happen.

View this webcast 
Research Report: Information Management Initiatives at Midsize and Large Organizations
See the top-line results of this Computerworld sponsored survey to see how IT and business leaders are handling information management implementation.

Download this report 
White Paper: Information Management: Better Information for Winning Decisions.
This white paper explains how the SAS Information Evolution Model aids companies in assessing how they use this information to make strategic decisions and drive business.

Download this white paper