当前位置:网站首页>Introduction to data platform
Introduction to data platform
2022-06-24 08:40:00 【An unreliable programmer】
The goal is
- To provide stable and reliable data for various business platforms
- Provide a general data processing flow solution
- Generate some topic oriented 、 Integrated 、 Changing over time 、 But the information itself is a relatively stable data set
- Integrate historical data from multiple data sources for fine-grained processing 、 Multidimensional analysis
- To put it bluntly, it means reading data –> The production data –> Process of data delivery
Some of the concepts
ETL
ETL,Extraction-Transformation-Loading Abbreviation , The Chinese name is data extraction 、 Transform and load .ETL Responsible for the distribution 、 Data in heterogeneous data sources such as relational data 、 The plane data files are extracted to the temporary middle layer for cleaning 、 transformation 、 Integrate , Finally loaded into a data warehouse or data mart , Become online analytical processing 、 The foundation of data mining .ETL yes BI The most important part of the project , Usually ETL It's going to cost the whole project 1/3 Time for ,ETL The quality of the design is directly related to it BI The success or failure of the project .ETL It's also a long-term process , Only to find and solve problems constantly , Can we make ETL More efficient operation , Provide accurate data for the later development of the project .
Data warehouse
Data warehouse , English name is Data Warehouse, It can be abbreviated as DW or DWH. Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes . For businesses that need business intelligence , Provide guidance for business process improvement 、 Monitoring time 、 cost 、 Quality and control .
Problems to be solved at present
- A task scheduling monitoring platform is required to manage data reading 、 production 、 A series of scripts delivered , Task scheduling and monitoring .
- Need one API Interface platform to meet the ad hoc query of some data .
- A data synchronization platform is needed to synchronize the production data to each business end .
- A data inspection platform is needed to control the quality of the delivered data .
- Need one BI Data display platform to clearly display the data of various dimensions concerned by different roles .
Solution
- Use airflow To build ETL System , That is to compile and adjust the collection script of a series of data , Cleaning script , Data summary , polymerization , Pre calculate multi-dimensional indicators . Provide task monitoring and webUI Visual tasks depend on .
- Use dataX To complete data synchronization .
- Use lumen To do it API Interface platform .
- Data detection platform and BI The first phase of the exhibition will not be considered for the time being .
Technology stack
airflow(python)、lumen、postgreSQL、dataX、elasticsearch
In the later stage, based on the amount of data, we will do spark Distributed cluster offline computing ,hdfs Storage , Flow calculation 、hive etc.
Ideal state
Later log analysis can be accessed ETL System to analyze user behavior , User portrait , Improve the security of the system .
On performance daily report , weekly , The annual report and other data display and summary provide shorter time delay , Reduce the load of the business system .
Yes ERP The data are collected and analyzed to provide reference for the decision-making of the leadership .
Yes APP The logs are summarized and analyzed to provide some data facts for product design and operation .
At the same time, in the face of the rapid growth of data, big data analysis can also be handy .
“ Rome was not built in a day ”
边栏推荐
- RCNN、Fast-RCNN、Faster-RCNN介绍
- ZUCC_ Principles of compiling language and compilation_ Experiment 02 fsharp Ocaml language
- ZUCC_ Principles of compiling language and compilation_ Experiment 08 parsing LR parsing
- How to replace the web player easyplayerproactivex Key in OCX?
- ZUCC_编译语言原理与编译_大作业
- Easycvr invokes the interface parameter acquisition method and precautions of device video recording on the page
- Two methods of QT exporting PDF files
- How to implement approval function in Tekton
- ZUCC_ Principles of compiling language and compilation_ Big job
- Ordinary token
猜你喜欢
OpenCV to realize the basic transformation of image
JUC personal simple notes
Centos7安装jdk8以及mysql5.7以及Navicat连接虚拟机mysql的出错以及解决方法(附mysql下载出错解决办法)
ZUCC_ Principles of compiling language and compilation_ Experiment 03 getting started with compiler
ZUCC_ Principles of compiling language and compilation_ Experiment 05 regular expression, finite automata, lexical analysis
Vscode install the remote -wsl plug-in to connect to the local WSL
中国芯片独角兽公司
2022 tea artist (intermediate) work license question bank and online simulation examination
ZUCC_编译语言原理与编译_实验04 语言与文法
A preliminary study of IO model
随机推荐
OpenCV to realize the basic transformation of image
ZUCC_编译语言原理与编译_实验05 正则表达式、有限自动机、词法分析
【生活思考】计划与自律
JS scroll div scroll bar to bottom
Introduction to NC machine tool programming [G-code]
5 minutes, excellent customer service chat handling skills
ZUCC_编译语言原理与编译_大作业
2021-03-16 comp9021 class 9 notes
api平台通用签名机制
Common CVM transcribes audio using virtual sound card
Easycvr invokes the interface parameter acquisition method and precautions of device video recording on the page
成为IEEE学生会员
Pyqt common system events
How to improve the customer retention rate in the operation of independent stations? Customer segmentation is very important!
Common date formatter and QT method for obtaining current time
Markdown to realize text link jump
Permission model DAC ACL RBAC ABAC
rsync做文件备份
更改SSH端口号
ZUCC_编译语言原理与编译_实验06 07 语法分析 LL 分析