当前位置:网站首页>Data warehouse (1) what is data warehouse and what are the characteristics of data warehouse
Data warehouse (1) what is data warehouse and what are the characteristics of data warehouse
2022-06-26 08:55:00 【Zhang Fei's pig】
The original link of this article : What is a data warehouse , What are the characteristics of data warehouse
Data warehouse , It's called data warehouse for short , English name is Data Warehouse, It can be abbreviated as DW or DWH. Data warehouse , It's a decision-making process for all levels of the enterprise , A strategic set that provides support for all types of data . It's a single data store , Created for analytical reporting and decision support purposes . For businesses that need business intelligence , Provide guidance for business process improvement 、 Monitoring time 、 cost 、 Quality and control . Here we will introduce the data warehouse data development technology involved , Function of data warehouse , Characteristics of data warehouse, etc .
I simply make a metaphor , Data warehouse can be understood as a usage warehouse , Data is the goods in this warehouse , The developer of the data warehouse is the administrator of the warehouse , So a data warehouse is a way to manage data well , So that the data can be put in the warehouse , Easy BI、AI And other aspects of using data can better use the data in the warehouse , Make the data more valuable , Obviously, there are rules in a pile , Look for something in the tidy goods , It's more efficient than looking for things that haven't been sorted out .
Data warehouse is a decision support system (dss) And a structured data environment for online analytical application data sources . Data warehouse studies and solves the problem of obtaining information from database . The characteristics of data warehouse are subject oriented 、 Integration 、 Stability and time-varying .
Data warehouse , By bill, the father of data warehouse · Enmen (Bill Inmon) On 1990 in , The main function is still the online transaction processing of the organization through the information system (OLTP) A great deal of information accumulated over the years , Through the data warehouse theory unique data storage architecture , Do systematic analysis and sorting , To facilitate various analytical methods such as on-line analytical processing (OLAP)、 data mining (Data Mining) It's going on , And then support decision support system (DSS)、 In charge of information system (EIS) To create , Help decision-makers quickly and effectively from a large amount of information , Analyze valuable information , In order to facilitate decision-making and rapid response to changes in the external environment , Help build business intelligence (BI).
Bill, father of data warehouse · Enmen (Bill Inmon) stay 1991 Published in 2002 “Building the Data Warehouse”(《 Building a data warehouse 》) The definition proposed in the book is widely accepted —— Data warehouse (Data Warehouse) It's a theme oriented (Subject Oriented)、 Integrated (Integrated)、 Relatively stable (Non-Volatile)、 Reflect historical changes (Time Variant) Data set for , Used to support management decisions (Decision Making Support).
Characteristics of data warehouse :
- The data warehouse is themed ; The data organization of operational database is oriented to transaction processing tasks , The data in the data warehouse is organized according to a certain subject field . Topics refer to the key aspects that users care about when making decisions using data warehouse , A topic is usually related to multiple operational information systems .
- Data warehouse is integrated , The data of data warehouse comes from scattered operation data , Extract the required data from the original data , Process and integrate , Only after unification and integration can we enter the data warehouse ;
The data in the data warehouse is extracting the original scattered database data 、 After systematic processing on the basis of cleaning 、 Sum up and sort out , Inconsistencies in the source data must be eliminated , To ensure that the information in the data warehouse is the consistent global information about the whole enterprise .
The data of data warehouse is mainly used for enterprise decision analysis , The data operations involved are mainly data query , Once a certain data enters the data warehouse , In general, it will be retained for a long time , That is to say, there are a lot of query operations in data warehouse , But there are few modifications and deletions , It usually only needs to be loaded on a regular basis 、 Refresh .
Data in a data warehouse usually contains historical information , The system records the enterprise from a certain point in the past ( For example, the time when the data warehouse is applied ) Information to the current stages , Through this information , It can make quantitative analysis and forecast on the development process and future trend of the enterprise .- The data warehouse is not updatable , Data warehouse mainly provides data for decision analysis , The operations involved are mainly data query ;
- Data warehouses change over time , Traditional relational database system is more suitable for processing formatted data , Can better meet the needs of business processing . Stable data in read-only format , And it doesn't change over time .
- A summary of the . Operational data is mapped into formats available for decision making .
- The large capacity . Time series data sets are usually very large .
- Nonstandard .Dw Data can be and often is redundant .
- Metadata . Save the data that describes the data .
- data source . Data comes from internal and external non integrated operating systems .
Data warehouse , It's when there are a lot of databases , In order to further mine data resources 、 For the sake of decision-making , It's not “ Large databases ”. The purpose of data warehouse scheme construction , It is based on front-end query and analysis , Due to large redundancy , Therefore, the storage required is also large .
In the concrete practice , In order to better serve the data application , That is, for data analysis , Efficient development of data reports . Data warehouse often has the following characteristics :
- Efficient enough .
The analysis data of data warehouse is generally divided into days 、 Zhou 、 month 、 season 、 Years etc. , It can be seen that , The data with daily cycle requires the highest efficiency , requirement 24 Hours and even 12 Within hours , Customers can see yesterday's data analysis . Because some enterprises have a large amount of data every day , Poor design
The data warehouse of often has problems , Delay 1-3 Data can only be given in a day , Obviously not .- Data quality .
All kinds of information provided by data warehouse , Be sure to have accurate data , But because the data warehouse process is usually divided into multiple steps , Including data cleaning , load , Inquire about , Show, etc , A complex architecture will have more layers , Then, because the data source has dirty data or the code is not rigorous , Can cause data distortion , When the customer sees the wrong information, it may lead to the analysis of the wrong decision , Losses caused , Not benefits .- Extensibility .
The reason why the architecture design of some large-scale data warehouse systems is complex , Because of the future 3-5 Year scalability , In this case , In the future, you don't have to spend money to rebuild the data warehouse system , It can run stably . It is mainly reflected in the rationality of data modeling , There are more middle layers in the data warehouse scheme , Make the massive data stream have enough buffer , Not a lot of data , It won't work .
As can be seen from the introduction above , Data warehouse technology can wake up the data accumulated by enterprises for many years , Not only for enterprises to manage these massive data , And mining the potential value of data , Thus, it becomes one of the highlights of the operation and maintenance system of communication enterprises .
In a broad sense , The decision support system based on data warehouse is composed of three parts
: Data warehouse technology , Online analytical processing technology and data mining technology , Data warehouse technology is the core of the system , In later articles in this series , It will focus on data warehouse technology , This paper introduces the main technology of modern data warehouse and the main steps of data processing , Discuss how to use these technologies in the communication operation and maintenance system to help the operation and maintenance .- subject-oriented
The data organization of operational database is oriented to transaction processing tasks , Each business system is separated from each other , The data in the data warehouse is organized according to a certain subject field . The topic is corresponding to the application-oriented of traditional database , It's an abstract concept , It is to integrate the data in the enterprise information system at a higher level 、 The abstraction of categorizing and analyzing . Each topic corresponds to a macro analysis field . The data warehouse eliminates data that is useless for decision-making , Provides a concise view of a specific topic .
边栏推荐
- Golang JSON unsupported value: Nan processing
- Simulation of parallel structure using webots
- Stream analysis of hevc learning
- QT_ AI
- Relationship extraction --r-bert
- What are the conditions for Mitsubishi PLC to realize Ethernet wireless communication?
- 1.17 daily improvement of winter vacation learning (frequency school and Bayesian school) and maximum likelihood estimation
- Clion installation + MinGW configuration + opencv installation
- Realizing sequence annotation with transformers
- Opencv learning notes II
猜你喜欢

Relationship extraction -- casrel

Drawing with MATLAB (2) -- color ring

Digital image processing learning (II): Gaussian low pass filter

isinstance()函数用法

Trimming_ nanyangjx

Reverse crawling verification code identification login (OCR character recognition)

Slider verification - personal test (JD)

Yolov5进阶之二安装labelImg

【程序的编译和预处理】

利用无线技术实现分散传感器信号远程集中控制
随机推荐
Parameter understanding of quad dataloader in yolov5
【云原生 | Kubernetes篇】深入万物基础-容器(五)
Recyclerview item gets the current position according to the X and Y coordinates
Tensor
The principle and function of focus
Clion installation + MinGW configuration + opencv installation
Implementation of ffmpeg audio and video player
三菱PLC若想实现以太网无线通讯,需要具备哪些条件?
What are the conditions for Mitsubishi PLC to realize Ethernet wireless communication?
Realizing sequence annotation with transformers
Summary of common instructions for arm assembly
Google Chrome collection
Steps for ROS to introduce opencv (for cmakelist)
基于SSM的毕业论文管理系统
Ltp-- extract time, person and place
Detailed explanation of traditional image segmentation methods
软件工程-个人作业-提问回顾与个人总结
Summary of mobile terminal lightweight model data
鲸会务为活动现场提供数字化升级方案
Relationship extraction -- casrel