当前位置:网站首页>It is enough to read this article about ETL. Three minutes will let you understand what ETL is
It is enough to read this article about ETL. Three minutes will let you understand what ETL is
2022-06-24 08:44:00 【Bi visualization of Parker data】
Today, let's talk about a technical problem , of ETL Development of . For those who have business intelligence BI Friends of development ,ETL No stranger , As long as the data extraction of the data source is involved 、 Development of data calculation and processing process , All are ETL.
ETL What is it?
ETL There are three stages , Namely Extraction extract ,Transformation transformation ,Loading load . Extract data from different data sources EXTRACTION , According to certain data processing rules, data processing and format conversion TRASFORMATION, The output of the final processing to the target data table may also be a file, etc , This is LOADING.

ETL - Parker data business intelligence BI Visual analysis platform
More generally speaking ,ETL The process of cooking is the same as that of daily cooking , You need to buy good food at the stalls of the market , Pick the vegetables when you buy them , Wash , Cut everything and finally put it into the pot to fry and bring it to the table . Each stall in the vegetable market is the data source , A good dish is the final output , All the processes in the middle are like picking vegetables 、 Wash the dishes 、 Chopping vegetables 、 Cooking is conversion .
ETL How to achieve
In development , Most of the time it will pass ETL Tools to achieve , For example, common ones like KETTLE、PENTAHO、IBM DATASTAGE、INFORNAICA、 Microsoft SQL SERVER Inside SSIS wait , In combination with the basic SQL To achieve the whole ETL The process .
Some of them develop their own programs , Then control some data processing scripts to run in batches , It's basically program plus SQL Realization .

ETL - Parker data business intelligence BI Visual analysis platform
Which way is better , It also depends on the usage scenarios and the developers' more skillful use of that method . I think most software developers come from , When encountering data projects, I prefer to use program to control running batch , This is the natural continuation of procedural thinking . pure business intelligence BI Most developers naturally choose mature ETL Tools to develop , Of course, there are also people who write program scripts as soon as they come up , This kind of business intelligence BI Developers' masters are basically transferred by programmers .
The advantage of using the program is that it is adaptable , High scalability , It can be integrated or disassembled into any program processing process , Sometimes it is more efficient to use program development . The difficulty lies in having certain technical requirements for maintenance personnel , Experience transfer and replicability are not enough .

ETL - Parker data business intelligence BI Visual analysis platform
use ETL The benefits of tools , The first is the whole ETL The development process is visualized , In particular, it can be clearly managed in the hierarchical design of data processing process . The second is when linking to different data sources , Various data sources 、 The database link protocol has been built in , It can be configured directly , There is no need to write a program to realize . Thirdly, various conversion controls can be used by dragging and dropping , Play a simplified part instead of SQL Development of , There is no need to write code to implement . The fourth is to be able to design all kinds of ETL Scheduling rules , Highly configurable , There is no need to write code to realize this .
So in most common projects , Use... On projects ETL There will be more standard component development .
ETL What is the design concept
ETL Logically, it can be generally divided into two layers , Control flow and data flow , That's a lot ETL The idea of tool design , Different ETL Tools may be called differently .
Control flow is to control the sequence of each data flow and data flow processing , A control flow can contain multiple data flows . For example, in the process of data warehouse development , The first layer of processing is ODS Layer or staging Layer development , The second level is DIMENSION Development of dimension layer , The next few floors are DW In fact 、DM Development of data mart layer . adopt ETL The scheduling management can make these layers connected to form a complete data processing process .

ETL - Parker data business intelligence BI Visual analysis platform
Data flow is a specific data conversion process from source data to target data table , So there is also a ETL Tools call data flows transformations . There are three main links in the development and design of data flow , Link to target data table , These two go directly through ETL Control configuration is OK . Intermediate conversion link , There may be many choices at this time , transfer SQL sentence 、 stored procedure , Or use ETL Control to implement .
Some projects are used to ETL Control to implement the transformation in the data flow , Some projects require that stored procedures be used to call... Without using standard transformation components . There are also data warehouses that do not support stored procedures, so they can only pass the standard SQL To achieve .
ETL What is architecture
We usually talk about business intelligence BI What data architects really mean is ETL Architecture design , This is the whole business intelligence BI A very core layer of technology implementation in the project , Data processing 、 Data cleaning and modeling are both in ETL Achieve in .

business intelligence BI - Parker data business intelligence BI Visual analysis platform
A good ETL The architecture design can support hundreds of packages at the same time, which is the control flow , There may be hundreds of data streams under each control stream . I wrote a technical article before , You can search for keywords BIWORK ETL This article can be found on the Internet .
This frame design is more than just ETL The design of the framework , There are also very deep ETL Project management and normative controller idea , Including the later operation and maintenance , Based on business intelligence BI Business intelligence BI analysis ,ETL The performance tuning of will be reflected in these frameworks . Because big business intelligence BI The project may require dozens of people to develop at the same time ETL, The top-level design of the framework is very important .
边栏推荐
- 日本大阪大学万伟伟研究员介绍基于WRS系统机器人的快速集成方法和应用
- After interviewing and tutoring several children, I found some problems!
- Fundamentals of 3D mathematics [17] inverse square theorem
- One development skill a day: how to establish P2P communication based on webrtc?
- Markdown to realize text link jump
- How to handle the problem that calling easycvr address integration cannot be played through easyplayer player?
- 教程篇(5.0) 08. Fortinet安全架构集成与FortiXDR * FortiEDR * Fortinet 网络安全专家 NSE 5
- Easydss anonymous live channel data volume instability optimization scheme sharing
- 【生活思考】计划与自律
- js中通过key查找和更新对象中指定值的方法
猜你喜欢
随机推荐
[acnoi2022] not a structure, more like a structure
How to configure networkpolicy for nodeport in kubernetes
Rescue system -- the application of read-write separation
解析互联网广告术语 CPM、CPC、CPA、CPS、CPL、CPR 是什么意思
提高INSERT速度
Battle history between redis and me under billion level traffic
05 Ubuntu installing mysql8
Shell array
There was an error checking the latest version of pip
Matlab求解线性方程组Ax=b
RuntimeError: Missing dependencies:XXX
Increase insert speed
ZUCC_编译语言原理与编译_实验04 语言与文法
PHP代码加密的几种方案
A preliminary study of IO model
[life thinking] planning and self-discipline
PHP code encryption + extended decryption practice
一文讲透,商业智能BI未来发展趋势如何
ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析
AUTO PWN







