QuickStudy: Extract, Transform and Load (ETL)
Computerworld - ETL stands for extract, transform and load, the processes that enable companies to move data from multiple sources, reformat and cleanse it, and load it into another database, a data mart or a data warehouse for analysis, or on another operational system to support a business process.
Companies know they have valuable data lying around throughout their networks that needs to be moved from one place to anothersuch as from one business application to another or to a data warehouse for analysis.
The only problem is that the data lies in all sorts of heterogeneous systems, and therefore in all sorts of formats. For instance, a CRM system may define a customer in one way, while a back-end accounting system may define the same customer differently.
To solve the problem, companies use extract, transform and load (ETL) software, which includes reading data from its source, cleaning it up and formatting it uniformly, and then writing it to the target repository to be exploited.
The data used in ETL processes can come from any source: a mainframe application, an ERP application, a CRM tool, a flat file, an Excel spreadsheeteven a message queue.
Pulling the Data
Extraction can be done via Java Database Connectivity, Microsoft Corp.'s Open Database Connectivity technology, proprietary code or by creating flat files, says Mike Schiff, an analyst at Current Analysis Inc., a Sterling, Va.-based consultancy.
After extraction, the data is transformed, or modified, depending on the specific business logic involved so that it can be sent to the target repository.
There are a variety of ways to perform the transformation, and the work involved varies. The data may require reformatting only, but most ETL operations also involve cleansing the data to remove duplicates and enforce consistency. Part of what the software does is examine individual data fields and apply rules to consistently convert the contents to the form required by the target repository or application, says Schiff.
For example, the category "male" might be represented in three different systems as M, male and 0/1. The ETL software would recognize that these entries mean the same thing and convert them to the target format.
In addition, the ETL process could involve standardizing name and address fields, verifying telephone numbers or expanding records with additional fields containing demographic information or data from other systems.
Harriet Fryman, group director of product marketing at data warehousing vendor Informatica Corp. in Redwood City, Calif., offers an example. Say, for instance, that a customer runs Oracle financials, PeopleSoft human resources software and SAP manufacturing applications and needs to access the data in each of these systems to complete an order-to-cash process. This will require the company's ETL software to extract data from the originating systems, which isn't as easy as it sounds in some instancesfor example, pulling data from the SAP manufacturing application would require the generation of SAP proprietary ABAP code to extract the shipping and open purchase-order information.
The transformation occurs when the data from each source is mapped, cleansed and reconciled so it all can be tied together, with receivables tied to invoices and so on.
Additional Resources



Learn the important issues you must consider before starting your next mobility initiative. Get your mobility white paper from IDC now, compliments of Sybase.
White Papers & Webcasts
Global Retail Insights, an IDC Company: Accelerating the Path to Demand Intelligence with a Demand Signal Repository
Utilizing demand intelligence improves the precision of pricing, product assortments, channel/store placement, and promotion, which are all essential for sustainable revenue management performance....
Usability Is Everything
Learn what sets Workday's HR and Payroll solutions apart from the competition....
How to Populate Your CMDB in Four Steps
Generally, CMDBs populated by discovery tools alone simply aren't usable. They fail because they contain massive amounts of irrelevant information. You avoid this...
The Value of Real SaaS at Workday
Cost savings, speed to value, and innovation brought to the enterprise by Workday's software-as-a-service solutions for HR and Payroll....
A New Breed of Data Warehouse
Because enterprise data warehouses are evolving along different dimensions, organizations need to effectively manage metadata to ensure successful use of their data warehouse...
SaaS at Flextronics, Inc.
Dave Smoley, CIO of Flextronics, discusses the real value of software-as-a-service and why he chose Workday for his HR solution....
Virtualization Sanity Check: Stopping the Sprawl with Complete IT Visibility
Virtualization continues to grow at 20 percent or more per year, but it is not expected to overtake existing physical architectures at least...
Why Compliance Pays
This OnDemand webcast explores the relationship that firms with best compliance records have higher revenue, greater customer retention, lower financial losses from data...
Accelerate SSL Encrypted Applications
The amount of SSL traffic is growing in the enterprise. Because it is encrypted, it cannot be properly controlled and accelerated. Blue Coat...
Agile Enterprise Content Management (ECM) for Rapid ROI
Find out how combining ECM and BPM will help adress issues about content rich business processes....
Subscribe to Computerworld

