Data Integration

Your data sources are where your transactional and corporate data reside. To report, analyze and act on this data, you need first to connect to your data sources and bring them together.

There are many different ways to bring data together. From various kinds of connectors to ETL tools (extract, transform and load), from mashups to Web services, from datasource-neutral BI solutions to ones requiring massive meta-infrastructures, you have many models to choose. In general, however, an ETL provides a means of collecting, optimizing, and storing that data to better serve your company’s reporting and analysis needs.

A small company working with few pieces of data from homogeneous sources can have the flexibility to manage this data in different ways. However, the higher your data volume and the more diverse your data sources, the harder to organize, manage and ultimately rely upon this data. This is when it is useful to switch to an ETL.

How ETL manages and creates a process around your data:

The extract step in an ETL job reads the data from one or more data sources. A good-quality Web-based ETL is “data source neutral” and is capable of reading data from almost any data source, including databases, flat files, spreadsheets, RSS/ATOM feeds and Web services.

The transform step in an ETL job manipulates the data gathered in the previous step. Here, data is combined, cleaned up, processed and optimized for reporting and analysis.

The load step in an ETL job takes the data collected and optimized and writes it back out to one or more destinations. In a good ETL, these can be almost any data source, including databases, flat files, spreadsheets, and Web services, RSS/ATOM feeds–just as is true of the extract step.

When Does Data Integration or ETL Become Necessary?

It is of course possible to report directly against your databases or data source(s). However, there is a point past which data volume, diversity of data sources and other important considerations make it desirable to have a data integration or ETL. If you are a data architect, developer or database administrator, here are some of the questions you need to ask yourself in this regard:

  • Is the volume of your data growing noticeably?
  • Is your company using an increasing number of data sources?
  • Do you need a convenient way to integrate your data across different applications?
  • Do you want to find a way to make your data more accurate and easier to understand?
  • Are you searching for an efficient way to manage or create a process around your data?

If you have answered any of these questions in the affirmative, you may need to look into acquiring a data integration or ETL tool.

RETURN TO THE BI ENCYCLOPEDIA