ETL: Extract, Transform & Load

A Web-based ETL works like a Web service to help you integrate your data.

In business intelligence, an ETL tool extracts data from one or more data-sources, transforms it and cleanses it to be optimized for reporting and analysis, and loads it into a data store or data warehouse. ETL stands for extract, transform, and load.

There are many different models of ETL tools in today’s BI market, from complex, specialized products to light, Web-based solutions that work easily with multiple data sources.

Benefits of a Web-based ETL

A Web-based ETL gives you these unique benefits:

  • Fully Web-based Data Integration – With a Web-based ETL, you can not only seamlessly integrate your data, but also integrate the ETL with your other BI applications-regardless of vendor or brand. Use the ETL as a Web Service, launch ETL jobs from any standard-type processes and Web processes. Integrate the ETL into your business processes and workflows tied to triggers and alerts.
  • Unique Web Data Sources – Do more with a diverse set of data: a Web-based ETL gives you easy connections out of the box with Web Services and other Web-oriented data sources (e.g. SalesForce.com, Google Docs, RSS and ATOM feeds). Today’s most cutting-edge, Web-based ETL tools connect with relational databases and flat-file data sources.
  • Elemental Development Methodology – The same concepts you use to define logic in reports, templates, process files, etc., can be applied and even reused/shared in a Web-based ETL.
  • Optimization for BI and Reporting – Look for a Web-based ETL that is designed to work with data geared towards reporting, analysis and visualization. In particular, there are Web-based ETL tools that are created and marketed by companies specializing in BI; apart from truly optimizing data for reporting and analysis, this type of ETL will integrate seamlessly with your other BI applications.

How ETL manages and create a process around your data:

The extract step in an ETL job reads the data from one or more data sources. A good-quality Web-based ETL is “data source neutral” and is capable of reading data from almost any data source, including databases, flat files, spreadsheets, RSS/ATOM feeds and Web services.

The transform step in an ETL job manipulates the data gathered in the previous step. Here, data is combined, cleaned up, processed and optimized for reporting and analysis.

The load step in an ETL job takes the data collected and optimized and writes it back out to one or more destinations. In a good ETL, these can be almost any data source, including databases, flat files, spreadsheets, and Web services, RSS/ATOM feeds–just as is true of the extract step.

When Does Data Integration or ETL Become Necessary?

It is of course possible to report directly against your databases or data source(s). However, there is a point past which data volume, diversity of data sources and other important considerations make it desirable to have a data integration or ETL. If you are a data architect, developer or database administrator, here are some of the questions you need to ask yourself in this regard:

  • Is the volume of your data growing noticeably?
  • Is your company using an increasing number of data sources?
  • Do you need a convenient way to integrate your data across different applications?
  • Do you want to find a way to make your data more accurate and easier to understand?
  • Are you searching for an efficient way to manage or create a process around your data?

If you have answered any of these questions in the affirmative, you may need to look into acquiring a data integration or ETL tool.

RETURN TO THE BI ENCYCLOPEDIA