当前位置:网站首页>It is enough to read this article about ETL. Three minutes will let you understand what ETL is
It is enough to read this article about ETL. Three minutes will let you understand what ETL is
2022-06-24 08:44:00 【Bi visualization of Parker data】
Today, let's talk about a technical problem , of ETL Development of . For those who have business intelligence BI Friends of development ,ETL No stranger , As long as the data extraction of the data source is involved 、 Development of data calculation and processing process , All are ETL.
ETL What is it?
ETL There are three stages , Namely Extraction extract ,Transformation transformation ,Loading load . Extract data from different data sources EXTRACTION , According to certain data processing rules, data processing and format conversion TRASFORMATION, The output of the final processing to the target data table may also be a file, etc , This is LOADING.

ETL - Parker data business intelligence BI Visual analysis platform
More generally speaking ,ETL The process of cooking is the same as that of daily cooking , You need to buy good food at the stalls of the market , Pick the vegetables when you buy them , Wash , Cut everything and finally put it into the pot to fry and bring it to the table . Each stall in the vegetable market is the data source , A good dish is the final output , All the processes in the middle are like picking vegetables 、 Wash the dishes 、 Chopping vegetables 、 Cooking is conversion .
ETL How to achieve
In development , Most of the time it will pass ETL Tools to achieve , For example, common ones like KETTLE、PENTAHO、IBM DATASTAGE、INFORNAICA、 Microsoft SQL SERVER Inside SSIS wait , In combination with the basic SQL To achieve the whole ETL The process .
Some of them develop their own programs , Then control some data processing scripts to run in batches , It's basically program plus SQL Realization .

ETL - Parker data business intelligence BI Visual analysis platform
Which way is better , It also depends on the usage scenarios and the developers' more skillful use of that method . I think most software developers come from , When encountering data projects, I prefer to use program to control running batch , This is the natural continuation of procedural thinking . pure business intelligence BI Most developers naturally choose mature ETL Tools to develop , Of course, there are also people who write program scripts as soon as they come up , This kind of business intelligence BI Developers' masters are basically transferred by programmers .
The advantage of using the program is that it is adaptable , High scalability , It can be integrated or disassembled into any program processing process , Sometimes it is more efficient to use program development . The difficulty lies in having certain technical requirements for maintenance personnel , Experience transfer and replicability are not enough .

ETL - Parker data business intelligence BI Visual analysis platform
use ETL The benefits of tools , The first is the whole ETL The development process is visualized , In particular, it can be clearly managed in the hierarchical design of data processing process . The second is when linking to different data sources , Various data sources 、 The database link protocol has been built in , It can be configured directly , There is no need to write a program to realize . Thirdly, various conversion controls can be used by dragging and dropping , Play a simplified part instead of SQL Development of , There is no need to write code to implement . The fourth is to be able to design all kinds of ETL Scheduling rules , Highly configurable , There is no need to write code to realize this .
So in most common projects , Use... On projects ETL There will be more standard component development .
ETL What is the design concept
ETL Logically, it can be generally divided into two layers , Control flow and data flow , That's a lot ETL The idea of tool design , Different ETL Tools may be called differently .
Control flow is to control the sequence of each data flow and data flow processing , A control flow can contain multiple data flows . For example, in the process of data warehouse development , The first layer of processing is ODS Layer or staging Layer development , The second level is DIMENSION Development of dimension layer , The next few floors are DW In fact 、DM Development of data mart layer . adopt ETL The scheduling management can make these layers connected to form a complete data processing process .

ETL - Parker data business intelligence BI Visual analysis platform
Data flow is a specific data conversion process from source data to target data table , So there is also a ETL Tools call data flows transformations . There are three main links in the development and design of data flow , Link to target data table , These two go directly through ETL Control configuration is OK . Intermediate conversion link , There may be many choices at this time , transfer SQL sentence 、 stored procedure , Or use ETL Control to implement .
Some projects are used to ETL Control to implement the transformation in the data flow , Some projects require that stored procedures be used to call... Without using standard transformation components . There are also data warehouses that do not support stored procedures, so they can only pass the standard SQL To achieve .
ETL What is architecture
We usually talk about business intelligence BI What data architects really mean is ETL Architecture design , This is the whole business intelligence BI A very core layer of technology implementation in the project , Data processing 、 Data cleaning and modeling are both in ETL Achieve in .

business intelligence BI - Parker data business intelligence BI Visual analysis platform
A good ETL The architecture design can support hundreds of packages at the same time, which is the control flow , There may be hundreds of data streams under each control stream . I wrote a technical article before , You can search for keywords BIWORK ETL This article can be found on the Internet .
This frame design is more than just ETL The design of the framework , There are also very deep ETL Project management and normative controller idea , Including the later operation and maintenance , Based on business intelligence BI Business intelligence BI analysis ,ETL The performance tuning of will be reflected in these frameworks . Because big business intelligence BI The project may require dozens of people to develop at the same time ETL, The top-level design of the framework is very important .
边栏推荐
- Redis cluster data skew
- Use cpulimit to free up your CPU
- How to implement approval function in Tekton
- "Adobe international certification" Photoshop software, about drawing tutorial?
- 解析互联网广告术语 CPM、CPC、CPA、CPS、CPL、CPR 是什么意思
- ZUCC_ Principles of compiling language and compilation_ Big job
- String转Base64
- Opencv实现图像的基本变换
- Several ways you can't move zero (sequel)
- Win10 cloud, add Vietnamese
猜你喜欢

教程篇(5.0) 08. Fortinet安全架构集成与FortiXDR * FortiEDR * Fortinet 网络安全专家 NSE 5

ZUCC_编译语言原理与编译_实验03 编译器入门

uniapp 热更新后台管理

jwt(json web token)

关于ETL看这篇文章就够了,三分钟让你明白什么是ETL

ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析

表单图片上传在Chorme中无法查看请求体的二进制图片信息
![[micro services ~nacos] Nacos service providers and service consumers](/img/b7/47ecd6979ccfeb270261681d6130be.png)
[micro services ~nacos] Nacos service providers and service consumers

ZUCC_编译语言原理与编译_实验04 语言与文法

ZUCC_ Principles of compiling language and compilation_ Experiment 02 fsharp Ocaml language
随机推荐
提高INSERT速度
Xtrabackup for data backup
Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
Live broadcast appointment: growth of Mengxin Product Manager
Shell pass parameters
After interviewing and tutoring several children, I found some problems!
表单图片上传在Chorme中无法查看请求体的二进制图片信息
Markdown to realize text link jump
JS merge multiple objects and remove duplicates
K8S部署高可用postgresql集群 —— 筑梦之路
分布式 | 如何与 DBLE 进行“秘密通话”
Qt源码分析--QObject(2)
Battle history between redis and me under billion level traffic
ZUCC_ Principles of compiling language and compilation_ Experiment 01 language analysis and introduction
RuntimeError: Missing dependencies:XXX
Cloudbase database migration scheme
[life thinking] planning and self-discipline
Introduction to NC machine tool programming [G-code]
MySQL 因字符集问题插入中文数据时提示代码 :1366
Detailed explanation of Base64 coding and its variants (to solve the problem that the plus sign changes into a space in the URL)