当前位置:网站首页>Inaccurate data accuracy in ETL process
Inaccurate data accuracy in ETL process
2022-06-26 15:11:00 【RestCloud】
Recently, a classmate was using Restcloud ETL Product data integration , After the data is transferred to the target database table , Inaccurate data accuracy .
The scene is : from oracle Source table data The format is :number(21,6) Synchronize data to mysql The data format of the target table is :float(21,6) ; Synchronous data Find out oracle yes :538121.47 Synchronize to mysql In the database :538121.50, See here , Inevitably, some students will think it is a product problem , Let's analyze .
First , We need to understand the difference between data and computer . Inside the computer , There are two ways to express decimals : Fixed point and floating point numbers .
1、 floating-point (float and double) Floating point type stores approximate values in the database
MySQL data type meaning
float(m,d) Single precision floating point 8 Bit accuracy (4 byte ) m Total number ,d⼩ digit
double(m,d) Double precision floating point 16 Bit accuracy (8 byte ) m Total number ,d⼩ digit
set up ⼀ Fields defined as float(5,3), If you insert ⼊⼀ Number 123.45678, Actual database ⾥ Deposit is 123.457, But the total number is still subject to the actual , namely 6 position .
2、 Fixed-point number Fixed point types store precise values in the database
Floating point type stores approximate values in the database ,⽽ The fixed-point type is stored in the database as an exact .decimal(m,d) Parameters m<65 It's the total number ,d<30 And d<m yes ⼩ digit .
For single precision floating point numbers Float: When the data is in range ±131072(65536×2) Inside ,float The data accuracy is correct , But data beyond this range is unstable , No relevant parameter setting suggestions are found : take float Change to double perhaps decimal, The difference between the two is double Is a floating point calculation ,decimal It is fixed-point calculation , Will get more accurate data .
Let's use analysis , First create a test table
CREATE TABLE customer ( id int(11) NOT NULL AUTO_INCREMENT, name varchar(45) DEFAULT NULL, age int(11) DEFAULT NULL, jinqian float(5,2) DEFAULT NULL, PRIMARY KEY (id) );
float(m,d)
m Indicates the maximum length ,d Indicates the number of decimal places displayed .
For example, above sql in :float(5,2) Express : The maximum length of this floating-point number is 5, That's five , Then the decimal part is 2 position , As for the storage range , It depends on whether you define unsigned .
Unsigned words , The minimum is 0.0 Can store up to 99999.9, If there is a symbol , The scope is :-99999.9 to 99999.9.
The default size is 24 Digit number , The accuracy is about 7 Digit number ( Tested as 6 position ), When setting M Size greater than 24 when , Automatic conversion to DOUBLE type ; Simultaneous setting M and D Do not perform automatic conversion .
Decimal places exceed the set value , Save by rounding
INSERTINTO customer (id,name,age,jinqian)VALUES(111111111,'uu',15,90.012);
INSERTINTO customer(id,name,age,jinqian)VALUES(1111111111,'uu',15,90.018);
The above two are saved as
summary
From the above analysis , We can draw the following conclusion :
1、 There is an error in floating-point numbers ;
2、 Data sensitive to precision, such as currency , It should be expressed or stored as a fixed-point number ;
3、 Programming , If floating-point numbers are used , Pay special attention to the error , And try to avoid floating-point comparison ;
4、 Pay attention to the handling of some special values in floating-point numbers ;
边栏推荐
- SAP GUI 770 Download
- Sikuli automatic testing technology based on pattern recognition
- Pytorch深度学习代码技巧
- Unity unitywebrequest download package
- 乐鑫 AWS IoT ExpressLink 模组达到通用可用性
- Analysis of ble packet capturing debugging information
- Unity C # e-learning (IX) -- wwwfrom
- R语言caTools包进行数据划分、scale函数进行数据缩放、class包的knn函数构建K近邻分类器
- The DOTPLOT function in the epidisplay package of R language visualizes the frequency of data points in different intervals in the form of point graphs, specifies the grouping parameters with the by p
- 北京银行x华为:网络智能运维夯实数字化转型服务底座
猜你喜欢

Document 1
杜老师说网站更新图解

RestCloud ETL抽取動態庫錶數據實踐

功能:crypto-js加密解密

Attention meets geometry: geometry guided spatiotemporal attention consistency self supervised monocular depth estimation

How to load the contour CAD drawing of the engineering coordinate system obtained by the designer into the new earth

RestCloud ETL解决shell脚本参数化

小程序:uniapp解决 vendor.js 体积过大的问题

Program analysis and Optimization - 8 register allocation

【TcaplusDB知识库】TcaplusDB OMS业务人员权限介绍
随机推荐
Optimizing for vectorization
ETL过程中数据精度不准确问题
编译配置in文件
Applicable and inapplicable scenarios of mongodb series
qt下多个子控件信号槽绑定方法
功能:crypto-js加密解密
一键分析硬件/IO/全国网络性能脚本(强推)
Bank of Beijing x Huawei: network intelligent operation and maintenance tamps the base of digital transformation service
Redis事务与watch指令
10 minutes to understand bim+gis fusion, common BIM data formats and characteristics
kubernetes的Controller之deployment
数据库-视图
php文件上传00截断
[async/await] - the final solution of asynchronous programming
Unity C # e-learning (10) -- unitywebrequest (1)
文献1
Pod scheduling of kubernetes
手机股票注册开户安全吗,有没有什么风险?
R language GLM function logistic regression model, using epidisplay package logistic The display function obtains the summary statistical information of the model (initial and adjusted odds ratio and
打新债注册开户安全吗,有没有什么风险?
