What is ETL?
ELT stands for Extraction Transformation and Loading of data from source to target. Is the process used to extract data from different data sources like databases, and data files, etc, and transforming the data using cleansing, profiling and business set of rules which then translate into a defined series of steps necessary to ensure that the data being loaded to the target system is compliant, valid and accurate to the specifics of each application.
What ETL tools do?
ETL tools execute one of the most challenging aspects in a data warehouse; the Extraction Transformation and Loading process. Their main tasks are:
- The discovery and connection with required information sources
- Extract the selected information, efficiently as specified.
- Transform the information from heterogeneous sources to a common architecture.
- Profile and clean the data according to target system constrains and business rules
- Load (insert or update) the transformed data into a homogeneous data source.
What happens in ETL Transformations?
The transformation procedures can be as simple as converting data types, formatting values, changing column names, eliminating bad data, correcting misspelling errors, filling missing data and so on, or they can be very elaborate based on complex rules for time dependencies, reporting errors, and data mapping and data flow paths between the source and targets can change depending on time and data values, load of data based on previous loads an the interdependencies of the data etc.
The ETL process plays an important factor in the development work of a data warehouse, mainly because a large amount of time is dedicated to the design and implementation of ETL processes. In this days of information age, the large amounts of data that is generated in any business is grown faster and faster, with faster computer and faster networks, the ability to produce and record incredible amounts of transactions, or information from the many different departments in an enterprise is so overwhelming that no single person in the organization knows where are all the operational data sources for each department is; just simply putting that generating reports and comparing numbers with other departments might yield discrepancies for the same data. There fore every business needs one an only one centralize place to go for historical information of the enterprise. This type of information structure allows for smarter business intelligence (BI) which should be heart of any business because it will enable them to understand, business trends, improve decisions, support and justify daily operations
Thanks to the ETL process this centralized source of information is kept updated and synchronized with data from all the different sources across the business required for effective business intelligence.
Where is ETL used?
ETL tools can be used for data integration to integrate existing data with other differently designed and structured kinds of data, making the migration process of data less painful and this type of tools are more commonly used to extract information from mainframes and them feed (load and update) the data to an integrated data warehouse, which is typically a very large database that contains all the important historical information of an enterprise business, and it is used to facilitate better intelligent decision making trough the use of query reporting and analysis, enabling the data warehouse users to work off a single set of data and therefore obtaining a single version of the truth.
What types of ETL tools technologies exist?
There are 3 mayor types of ETL tools:
- The traditional ETL engine based product, which generates code within its engine to accomplish the task.
- ETL solutions included with the RDBMS
- ETL tools based on SQL generation to take advantage of the specific RDBMS optimizations
ETL tool buy or built?
If it is the first time you ever get involved in the project of developing a data warehouse, sooner or later you will have to decide whether to purchase an off-the-shelf ETL software or build your own ETL solution, and of course the best answer to this question is the typical contractor answer: it depends on your situation. Whether or not you have enough resources, time and knowledge to build your own or you prefer to deal with the operating cost of evaluating, choosing, purchasing, installing and learning the ins and outs of an ETL package. Definitely both approaches have to be carefully and meticulously analyzed using advantages and disadvantages for each option to see which solution is the best to quickly and efficiently meet the needs of your business.
Buying an ETL tool:
- The amount of ETL development time will be drastically reduced
- Tool will come with wide range of rich built-in features
- The tool can be reused for future phases of the project, or future projects.
- The ETL rules will be more readable for easy of debugging
- Simplicity of ETL process maintenance
- Increase time spend for evaluation, selection and learning of the product
- The higher cost of purchasing, training and licensing fees might be out of budget
- Because is not custom build, might not fit 100% to your needs.
Building custom ETL processes:
- Lower financial costs of developing the home grown ETL vs. buying
- There is no training costs involved
- Is 100% specifically designed to match your needs
- More flexibility for complex data sources or transformation mappings
- Time need for the development process life cycle
- Not having all the rich functionality of an ETL software package
- All the overhead involved in maintaining custom code
- Difficulty or reusability for different projects
The bottom line when it comes to ETL is that not matter how much progress and how good the ETL package has improved, it still have a higher cost of acquisition while building a custom solutions is expensive to own and maintain. Each approach has its own particularities across industries and they vary a lot depending on your requirements, sometimes is easy to spot the solution you must have and sometimes is not; just remember all time when evaluating possibilities that the general tendency is to integrate data quickly, practically, and cost effectively.
The final word on ETL
In general ETL solutions truly depend on the circumstances of your scenarios, and the best way to decide is to conduct a deep evaluation of both methods for your needs.
In my experience if you have the budged and want to reduce the number of headaches, buying the tool could be the way to go because of the speed of deployment and the reusability in other projects.