当前位置:网站首页>It is enough to read this article about ETL. Three minutes will let you understand what ETL is

It is enough to read this article about ETL. Three minutes will let you understand what ETL is

2022-06-24 08:44:00 Bi visualization of Parker data

Today, let's talk about a technical problem , of  ETL Development of . For those who have   business intelligence BI Friends of development ,ETL No stranger , As long as the data extraction of the data source is involved 、 Development of data calculation and processing process , All are ETL.

ETL What is it?

ETL There are three stages , Namely Extraction extract ,Transformation transformation ,Loading load . Extract data from different data sources  EXTRACTION , According to certain data processing rules, data processing and format conversion TRASFORMATION, The output of the final processing to the target data table may also be a file, etc , This is LOADING.

ETL - Parker data business intelligence BI Visual analysis platform

More generally speaking ,ETL The process of cooking is the same as that of daily cooking , You need to buy good food at the stalls of the market , Pick the vegetables when you buy them , Wash , Cut everything and finally put it into the pot to fry and bring it to the table . Each stall in the vegetable market is the data source , A good dish is the final output , All the processes in the middle are like picking vegetables 、 Wash the dishes 、 Chopping vegetables 、 Cooking is conversion .

ETL How to achieve

In development , Most of the time it will pass  ETL Tools to achieve , For example, common ones like KETTLE、PENTAHO、IBM DATASTAGE、INFORNAICA、 Microsoft SQL SERVER Inside SSIS wait , In combination with the basic SQL To achieve the whole ETL The process .

Some of them develop their own programs , Then control some data processing scripts to run in batches , It's basically program plus  SQL Realization .

ETL - Parker data business intelligence BI Visual analysis platform

Which way is better , It also depends on the usage scenarios and the developers' more skillful use of that method . I think most software developers come from , When encountering data projects, I prefer to use program to control running batch , This is the natural continuation of procedural thinking . pure   business intelligence BI Most developers naturally choose mature ETL Tools to develop , Of course, there are also people who write program scripts as soon as they come up , This kind of   business intelligence BI Developers' masters are basically transferred by programmers .

The advantage of using the program is that it is adaptable , High scalability , It can be integrated or disassembled into any program processing process , Sometimes it is more efficient to use program development . The difficulty lies in having certain technical requirements for maintenance personnel , Experience transfer and replicability are not enough .

ETL - Parker data business intelligence BI Visual analysis platform

use  ETL The benefits of tools , The first is the whole ETL The development process is visualized , In particular, it can be clearly managed in the hierarchical design of data processing process . The second is when linking to different data sources , Various data sources 、 The database link protocol has been built in , It can be configured directly , There is no need to write a program to realize . Thirdly, various conversion controls can be used by dragging and dropping , Play a simplified part instead of SQL Development of , There is no need to write code to implement . The fourth is to be able to design all kinds of ETL Scheduling rules , Highly configurable , There is no need to write code to realize this .

So in most common projects , Use... On projects  ETL There will be more standard component development .

ETL What is the design concept

ETL Logically, it can be generally divided into two layers , Control flow and data flow , That's a lot ETL The idea of tool design , Different ETL Tools may be called differently .

Control flow is to control the sequence of each data flow and data flow processing , A control flow can contain multiple data flows . For example, in the process of data warehouse development , The first layer of processing is ODS Layer or staging Layer development , The second level is DIMENSION Development of dimension layer , The next few floors are DW In fact 、DM Development of data mart layer . adopt ETL The scheduling management can make these layers connected to form a complete data processing process .

ETL - Parker data business intelligence BI Visual analysis platform

Data flow is a specific data conversion process from source data to target data table , So there is also a  ETL Tools call data flows transformations . There are three main links in the development and design of data flow , Link to target data table , These two go directly through ETL Control configuration is OK . Intermediate conversion link , There may be many choices at this time , transfer SQL sentence 、 stored procedure , Or use ETL Control to implement .

Some projects are used to  ETL Control to implement the transformation in the data flow , Some projects require that stored procedures be used to call... Without using standard transformation components . There are also data warehouses that do not support stored procedures, so they can only pass the standard SQL To achieve .

ETL What is architecture

We usually talk about business intelligence BI What data architects really mean is ETL Architecture design , This is the whole business intelligence BI A very core layer of technology implementation in the project , Data processing 、 Data cleaning and modeling are both in ETL Achieve in .

business intelligence BI - Parker data business intelligence BI Visual analysis platform

A good ETL The architecture design can support hundreds of packages at the same time, which is the control flow , There may be hundreds of data streams under each control stream . I wrote a technical article before , You can search for keywords BIWORK ETL This article can be found on the Internet .

This frame design is more than just ETL The design of the framework , There are also very deep ETL Project management and normative controller idea , Including the later operation and maintenance , Based on business intelligence BI Business intelligence BI analysis ,ETL The performance tuning of will be reflected in these frameworks . Because big business intelligence BI The project may require dozens of people to develop at the same time ETL, The top-level design of the framework is very important .

原网站

版权声明
本文为[Bi visualization of Parker data]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206240622241440.html