当前位置:网站首页>It is enough to read this article about ETL. Three minutes will let you understand what ETL is
It is enough to read this article about ETL. Three minutes will let you understand what ETL is
2022-06-24 08:44:00 【Bi visualization of Parker data】
Today, let's talk about a technical problem , of ETL Development of . For those who have business intelligence BI Friends of development ,ETL No stranger , As long as the data extraction of the data source is involved 、 Development of data calculation and processing process , All are ETL.
ETL What is it?
ETL There are three stages , Namely Extraction extract ,Transformation transformation ,Loading load . Extract data from different data sources EXTRACTION , According to certain data processing rules, data processing and format conversion TRASFORMATION, The output of the final processing to the target data table may also be a file, etc , This is LOADING.

ETL - Parker data business intelligence BI Visual analysis platform
More generally speaking ,ETL The process of cooking is the same as that of daily cooking , You need to buy good food at the stalls of the market , Pick the vegetables when you buy them , Wash , Cut everything and finally put it into the pot to fry and bring it to the table . Each stall in the vegetable market is the data source , A good dish is the final output , All the processes in the middle are like picking vegetables 、 Wash the dishes 、 Chopping vegetables 、 Cooking is conversion .
ETL How to achieve
In development , Most of the time it will pass ETL Tools to achieve , For example, common ones like KETTLE、PENTAHO、IBM DATASTAGE、INFORNAICA、 Microsoft SQL SERVER Inside SSIS wait , In combination with the basic SQL To achieve the whole ETL The process .
Some of them develop their own programs , Then control some data processing scripts to run in batches , It's basically program plus SQL Realization .

ETL - Parker data business intelligence BI Visual analysis platform
Which way is better , It also depends on the usage scenarios and the developers' more skillful use of that method . I think most software developers come from , When encountering data projects, I prefer to use program to control running batch , This is the natural continuation of procedural thinking . pure business intelligence BI Most developers naturally choose mature ETL Tools to develop , Of course, there are also people who write program scripts as soon as they come up , This kind of business intelligence BI Developers' masters are basically transferred by programmers .
The advantage of using the program is that it is adaptable , High scalability , It can be integrated or disassembled into any program processing process , Sometimes it is more efficient to use program development . The difficulty lies in having certain technical requirements for maintenance personnel , Experience transfer and replicability are not enough .

ETL - Parker data business intelligence BI Visual analysis platform
use ETL The benefits of tools , The first is the whole ETL The development process is visualized , In particular, it can be clearly managed in the hierarchical design of data processing process . The second is when linking to different data sources , Various data sources 、 The database link protocol has been built in , It can be configured directly , There is no need to write a program to realize . Thirdly, various conversion controls can be used by dragging and dropping , Play a simplified part instead of SQL Development of , There is no need to write code to implement . The fourth is to be able to design all kinds of ETL Scheduling rules , Highly configurable , There is no need to write code to realize this .
So in most common projects , Use... On projects ETL There will be more standard component development .
ETL What is the design concept
ETL Logically, it can be generally divided into two layers , Control flow and data flow , That's a lot ETL The idea of tool design , Different ETL Tools may be called differently .
Control flow is to control the sequence of each data flow and data flow processing , A control flow can contain multiple data flows . For example, in the process of data warehouse development , The first layer of processing is ODS Layer or staging Layer development , The second level is DIMENSION Development of dimension layer , The next few floors are DW In fact 、DM Development of data mart layer . adopt ETL The scheduling management can make these layers connected to form a complete data processing process .

ETL - Parker data business intelligence BI Visual analysis platform
Data flow is a specific data conversion process from source data to target data table , So there is also a ETL Tools call data flows transformations . There are three main links in the development and design of data flow , Link to target data table , These two go directly through ETL Control configuration is OK . Intermediate conversion link , There may be many choices at this time , transfer SQL sentence 、 stored procedure , Or use ETL Control to implement .
Some projects are used to ETL Control to implement the transformation in the data flow , Some projects require that stored procedures be used to call... Without using standard transformation components . There are also data warehouses that do not support stored procedures, so they can only pass the standard SQL To achieve .
ETL What is architecture
We usually talk about business intelligence BI What data architects really mean is ETL Architecture design , This is the whole business intelligence BI A very core layer of technology implementation in the project , Data processing 、 Data cleaning and modeling are both in ETL Achieve in .

business intelligence BI - Parker data business intelligence BI Visual analysis platform
A good ETL The architecture design can support hundreds of packages at the same time, which is the control flow , There may be hundreds of data streams under each control stream . I wrote a technical article before , You can search for keywords BIWORK ETL This article can be found on the Internet .
This frame design is more than just ETL The design of the framework , There are also very deep ETL Project management and normative controller idea , Including the later operation and maintenance , Based on business intelligence BI Business intelligence BI analysis ,ETL The performance tuning of will be reflected in these frameworks . Because big business intelligence BI The project may require dozens of people to develop at the same time ETL, The top-level design of the framework is very important .
边栏推荐
- ZUCC_ Principles of compiling language and compilation_ Big job
- Earthly 容器镜像构建工具 —— 筑梦之路
- Using ngrok for intranet penetration
- Video Fusion communication has become an inevitable trend of emergency command communication. How to realize it based on easyrtc?
- Easydss anonymous live channel data volume instability optimization scheme sharing
- [explain the difference between operation and maintenance and network engineering]
- 关于ETL看这篇文章就够了,三分钟让你明白什么是ETL
- Three categories of financial assets under the new standards: AMC, fvoci and FVTPL
- jwt(json web token)
- Background management of uniapp hot update
猜你喜欢

ZUCC_ Principles of compiling language and compilation_ Experiment 05 regular expression, finite automata, lexical analysis

【关于运维和网工的差别,一文说透】

表单图片上传在Chorme中无法查看请求体的二进制图片信息

ZUCC_ Principles of compiling language and compilation_ Experiment 02 fsharp Ocaml language
![[explain the difference between operation and maintenance and network engineering]](/img/2b/945f468588e729336e2e973e777623.jpg)
[explain the difference between operation and maintenance and network engineering]

MATLAB Camera Calibrator相机标定

为什么ping不通,而traceroute却可以通

Markdown to realize text link jump

ZUCC_ Principles of compiling language and compilation_ Experiment 01 language analysis and introduction

一文详解|增长那些事儿
随机推荐
图片工具
ZUCC_ Principles of compiling language and compilation_ Experiment 02 fsharp Ocaml language
"Adobe international certification" Photoshop software, about drawing tutorial?
解析互联网广告术语 CPM、CPC、CPA、CPS、CPL、CPR 是什么意思
QTimer定时器不起作用的原因
Battle history between redis and me under billion level traffic
更改SSH端口号
[micro services ~nacos] Nacos service providers and service consumers
One development skill a day: how to establish P2P communication based on webrtc?
【关于运维和网工的差别,一文说透】
New technology practice, encapsulating the permission application library step by step with the activity results API
liunx服务器 telnet 带用户名 端口登陆方法
ZUCC_编译语言原理与编译_实验04 语言与文法
Fundamentals of 3D mathematics [17] inverse square theorem
Variable declaration and some special variables in shell
Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
Pyqt common system events
Common CVM transcribes audio using virtual sound card
liunx 更改 vsftpd 的端口号
Send custom events in QT