Bridging Data Islands
Integrating data from disparate sources provides companies with powerful management tools, but the process can be difficult, costly and error-prone.
October 14, 2002 12:00 PM ETComputerworld - After just two months, a new software tool enabled Aventis Pharmaceuticals Inc. to discover a promising candidate for a new drug to treat asthma, arthritis or even perhaps cancer; it's a chemical compound that might well have been overlooked using traditional IT tools.
Aventis is using DiscoveryLink, a feature of IBM's DB2 database management system that can propel a single SQL query out to multiple, heterogeneous data sources and bring information back to the user in one coherent view.
"Using this integrated framework, scientists were able to pull data from many different sources around the world, visualize it in a new way that they could never do before," says Peter Loupos, vice president for drug innovation and approval information systems at the Bridgewater, N.J.-based company.
IBM calls the Aventis approach to information integration "database federation." To get at federated data, DB2 uses IBM "wrapper" software called DataJoiner and Relational Connect. There's one wrapper for each type of data sourcewhether it's from Oracle Corp. or Sybase Inc. systems, Microsoft Corp. SQL Server or flat fileswith each one mapping the source data model to the DB2 data model. A single Aventis query may be sent against heterogeneous relational databases, unstructured documents and in-house expertise culled from e-mail and other sources.
Sending a SQL query against remote, heterogeneous databases is just one of several ways to integrate data. Others include the following:
- Custom, hard-wired interfaces that pass information from one application to another. These can be made to work exactly as users demand, but they can be costly to set up and maintain.
- Replication, in which a commercial product regularly or continuously copies databases or parts of databases from one place to another. Replication is simple but limited in its ability to do anything to data beyond copying it.
- Extract, Transform and Load (ETL), a process often used to create data warehouses and data marts. ETL software moves data from one place to another, applying rules or table lookups to combine or transform data in some way. ETL is powerful but can be very complex.
- Web services. Enabled by Internet protocols including the XML standard for exchanging data between disparate systems, Web services allow SQL-based relational data to be accessed as XML, or native XML to be accessed through SQL. Web services are ideal when applications are loosely coupled and difficult to integrate in other ways.
Regardless of the approach taken, data integration can be difficult, expensive and error-prone. In particular, great care must be taken to build interfaces between applications and databases that ensure accuracy and timeliness of information and that answer the needs of disparate communities of end users. Below, we look at how two organizations tackled their data integration problems, and you can read two more case studies online .
Additional Resources


White Papers & Webcasts
MarketVibe: Communications and Collaboration Needs at Business Organizations
In April 2009, IT and business leaders were invited to participate in a survey on business communications and collaboration solutions. The goal of...
How to Reduce Eclipse BIRT Development Effort for Data Visualizations
Web applications can come with a long list of visualization requirements for structured data. By delivering your output through the BIRT Interactive Viewer,...
The Value of Network and Application Visibility by Aberdeen
This survey-based paper analyzes best practices for improving application visibility and analysis. This paper can help serve as a guideline for organizations looking...
Legacy IT Modernization - Practical Reality
(Source: BluePhoenix) Corporate budgets continue to tighten. Organizations are looking at ways to reduce operating costs and eliminate unnecessary expenses while at the...
The CIO's New Guide to Design of Global IT Infrastructure
Is it possible to eliminate the impact of distance? This paper explores the 5 key principles successful CIOs are using to redesign IT...
Interactive Guide: Getting Started with Data Governance
In this online interactive guide, Andrew White, Research VP with lead analyst firm Gartner, answers these questions to help get you on the...
2007 Gartner Magic Quadrant Report
Riverbed positioned in Leaders Quadrant of Gartner Magic Quadrant for WAN Optimization Controllers. Analyzing strengths vs. cautions, Gartner helps organizations looking to acquire...
Why Now is the Right Time for the Linux Desktop
(Source: Novell) Faced with tighter budgets, enterprises are rethinking their desktop strategies to deliver the same - if not better - services and...
Five Steps to Successful IT Consolidation
Has your Enterprise made the strategic decision to consolidate remote site IT infrastructure into central data centers? Then you have probably discovered that...
Agile Enterprise Content Management (ECM) for Rapid ROI
(Source: IBM) Content rich business processes are a core feature of daily operations at just about any organization today. Very often these essential...
Subscribe to Computerworld

