Skip the navigation

QuickStudy: Extract, Transform and Load (ETL)

By Marc L. Songini
February 2, 2004 12:00 PM ET

Computerworld - ETL stands for extract, transform and load, the processes that enable companies to move data from multiple sources, reformat and cleanse it, and load it into another database, a data mart or a data warehouse for analysis, or on another operational system to support a business process.




Companies know they have valuable data lying around throughout their networks that needs to be moved from one place to another—such as from one business application to another or to a data warehouse for analysis.

The only problem is that the data lies in all sorts of heterogeneous systems, and therefore in all sorts of formats. For instance, a CRM system may define a customer in one way, while a back-end accounting system may define the same customer differently.

To solve the problem, companies use extract, transform and load (ETL) software, which includes reading data from its source, cleaning it up and formatting it uniformly, and then writing it to the target repository to be exploited.

The data used in ETL processes can come from any source: a mainframe application, an ERP application, a CRM tool, a flat file, an Excel spreadsheet—even a message queue.

Pulling the Data

Extraction can be done via Java Database Connectivity, Microsoft Corp.'s Open Database Connectivity technology, proprietary code or by creating flat files, says Mike Schiff, an analyst at Current Analysis Inc., a Sterling, Va.-based consultancy.

After extraction, the data is transformed, or modified, depending on the specific business logic involved so that it can be sent to the target repository.

There are a variety of ways to perform the transformation, and the work involved varies. The data may require reformatting only, but most ETL operations also involve cleansing the data to remove duplicates and enforce consistency. Part of what the software does is examine individual data fields and apply rules to consistently convert the contents to the form required by the target repository or application, says Schiff.

For example, the category "male" might be represented in three different systems as M, male and 0/1. The ETL software would recognize that these entries mean the same thing and convert them to the target format.

In addition, the ETL process could involve standardizing name and address fields, verifying telephone numbers or expanding records with additional fields containing demographic information or data from other systems.

Harriet Fryman, group director of product marketing at data warehousing vendor Informatica Corp. in Redwood City, Calif., offers an example. Say, for instance, that a customer runs Oracle financials, PeopleSoft human resources software and SAP manufacturing applications and needs to access the data in each of these systems to complete an order-to-cash process. This will require the company's ETL software to extract data from the originating systems, which isn't as easy as it sounds in some instances—for example, pulling data from the SAP manufacturing application would require the generation of SAP proprietary ABAP code to extract the shipping and open purchase-order information.

The transformation occurs when the data from each source is mapped, cleansed and reconciled so it all can be tied together, with receivables tied to invoices and so on.



What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

White Papers
Switching Schedulers - Not As Complicated As You Think
Changing or consolidating job schedulers may seem daunting. However, the benefits of switching to enterprise workload automation outweigh the risks. Read how BMC...
Capture-Enabled Business Process Management
Organizations today must deal with a vast amount of incoming information from many different sources. Efficient, automated business processes are critical to managing...
Using Case Management to Empower Employees and transform Customer Service
This Kofax paper shows how successful customer service organizations have transformed customer service by empowering their employees. We will see how Dynamic Case...
Case Study: Audi-Volkswagen Improves Procurement Control
Audi-Volkswagen required a user-friendly, easy-to-use Business Process Management system that did not require programming skills or high levels of technical expertise in-house. This...
AIIM Market Intelligence: The paper-free office, dream or reality?
In this Aiim Market Intelligence report, produced in association with Kofax, we look at the success of paper-elimination projects, where and why paper...
Webcasts
Live Webcast
How to Reduce Complexity and Automate Your Partners for Efficient E-Business:
Date: Tuesday, June 5, 2012, 2:00 PM EDT

Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
Live Webcast
Data Privacy and Protection in Production Environments: New Research from Ponemon Institute
Date: Wednesday, June 13, 2012, 1:00 PM EDT / 10:00 AM PDT

In a recent study conducted by Ponemon Institute, fifty-five percent of respondents...
Live Webcast
A Geek's Guide to Presenting to Business People
Live Webcast: Wednesday, June 20th at 1:00 PM EDT

Join this live webinar with Paul Glen, author of Leading Geeks, to learn how to...
BMC Control-M - Single Point of Control Demo
With BMC Control-M, you schedule and manage everything - down to the very last platform and application - from one simple interface. It's...
BMC Control-M - Single Point of Control Demo
With BMC Control-M, you schedule and manage everything - down to the very last platform and application - from one simple interface. It's...
Sun Chemical Customer Success Story
Sun Chemical, the world's largest producer of printing inks and pigments, quadrupled its complex batch environment with zero extra headcount using BMC Control-M's...
Service-Enabling CICS Applications: Best Practices
This informative webcast provides an informed, thorough look into CICS service-enablement options and how they can affect your environment. You'll learn how to...
Teaching Legacy Application Elephants How to Dance
This four-minute video podcast shows how you can create services to continuously reuse enterprise applications, however and whenever needed, while leaving legacy logic...
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs