当前位置:网站首页>Floating point number exploration
Floating point number exploration
2022-07-25 09:21:00 【halazi100】
Floating point number exploration
Floating point numbers are used in computers to approximate any real number . specifically , The real number consists of an integer or fixed point ( Mantissa ) Times some base ( Usually in computers 2) Omega to the integer power of omega .
How to convert decimal to binary
Integral part
- Method 1 Integral part divided by 2 Write the remainder upside down
59/2 **** ***1
29/2 *** ***1
14/2 ** ***0
7/2 * ***1
3/2 ***1
1/2 **1
0/2 *0
0/2 0
59 The binary representation of 0011 1011;
- Method 2 Binary decomposition
Convert a decimal number into multiple 2 The sum of the integral powers of , Then they are converted into binary , Finally, merge all binaries ;
54 = 2^5 + 2^4 + 2^2 + 2^1
= 0010 0000 + 0001 0000 + 0000 0100 + 0000 0010
= 0011 0110
The fractional part
multiply 2 Rounding
Such as 0.25 Binary conversion
0.25*2=0.5 0
0.5*2 =1.0 1
namely 0.25 Convert binary to 01
Such as 0.4 Binary conversion
0.4*2 =0.8 0
0.8*2 =1.6 1
0.6*2 =1.2 1
0.2*2 =0.4 0
...
namely 0.4 Change to binary to 0110 0110 ...., That is, binary description of decimals cannot be absolutely accurate ;
The representation of floating point numbers
According to international standards IEEE 754, Any binary floating point number V It can be expressed as follows (-1)^S * M * 2^E
among
(-1)^SThe sign bit , When S=0,V Being positive , When S=1,V It's a negative number .MRepresents a significant number ,[1,2).2^EThe index , With 2 Base number .
For example, in the decimal system 5.0 It's written as a binary floating point number 101.0, In this form, it is (-1)^0 * 1.01 * 2^2,
among S=0,M=1.01,E=2.
Another example is the decimal system -5.5, It's written as a binary floating point number 101.1, In this form, it is (-1)^1 * 1.011 * 2^2,
among S=1,M=1.011,E=2.
The representation of floating-point numbers in memory
according to IEEE754 The standard stipulates :
about 32 Floating point number of bits (float type ), The highest bit is the sign bit S, Next in 8 Bits are exponents E, The rest 23 Bits are significant numbers M.
about 64 Floating point number of bits (double type ), The highest 1 Bits are sign bits S, And then 11 Bits are exponents E, The rest 52 Bits are significant numbers M.
┌─────────────┬──────────────────┬────────────────────┬─────────────────────┐
│ type │ S(sign bit) │ E(Exponent area) │ M(Mantissa area) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ float │ 1 bit(31bit) │ 8 bits(23-30bit) │ 23 bits(0-22bit) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ double │ 1 bit(63bit) │ 11 bits(52-62bit) │ 52 bits(0-51bit) │
└─────────────┴──────────────────┴────────────────────┴─────────────────────┘
float And double The representation of type data is the same inside the computer , However, due to the different storage space , The range and accuracy of data values that can be represented are different .
Sign bit S
For sign bits , Only 0 and 1 Two cases , They are positive and negative respectively .
Significant figures M
For significant numbers M, because M The range is [1,2), in other words M The integer part of must be 1, therefore IEEE754 The standard stipulates , Keep it in the computer M when , By default, the first digit of this number is always 1, So you can give up , Save only the following fraction .
For example preservation 1.01 When , Save only a fraction 01, And round off the integer part , Wait until you read , Put the first 1 Add .
The purpose of this is to save 1 Significant digits .
32 Bit floating-point numbers are left to M Only 23 position , After giving up the first one , You can keep 24 Significant digits .
Significant figures M The number of digits determines the accuracy of the data
- float:
2^23= 8388608, common 7 position , Most can have 7 Significant digits ; float The accuracy of is 6-7 Significant digits ( Can guarantee 6 position ); - double:
2^52= 4503599627370496, common 16 position , Most can have 16 Significant digits ;double The accuracy of is 15-16 Significant digits ;
Index part E
- Index part E It's an unsigned integer
- If E by 8 position (float type ), that E The range that can be expressed is 0-255,
- If E by 11 position (double type ), that E The range that can be expressed is 0-2047;
This index E Obviously it can be negative , but unsigned int The type of E It's a nonnegative number .
therefore IEEE754 The standard stipulates , In memory , The real index has to add an intermediate value (8 Bit E The median value is 127,11 Bit E The median value is 1023).
Like a float Count
E=3, Then when saving into memory, add 127 Programming 130 after , Then convert it into binary, that is1000 0010Post storage .
- E Not all for 0 Or not all of them 1
For floating-point numbers 5.0
S=0, Direct storage ;M=1.01, Round off integer 1, Put the decimal part 01 Storage , The spare bits in the back are 0 A filling ;E=2, Need to add 127 become 129 And convert it into binary post storage area ;
be 5.0 The final binary representation is
0-100 0000 1-010 0000 0000 0000 0000 0000
With 16 The hexadecimal display is40 A0 00 00
For floating-point numbers -5.5
S=1, Direct storage ;M=1.011, Round off integer 1, Put the decimal part 011 Storage , The spare bits in the back are 0 A filling ;E=2, Need to add 127 become 129 And convert it into binary post storage area ;
be 5.0 The final binary representation is
1-100 0000 1-011 0000 0000 0000 0000 0000
With 16 The hexadecimal display isC0 B0 00 00
┌─────────────┬──────────────────┬────────────────────┬─────────────────────┐
│ value │ S(sign bit) │ E(Exponent area) │ M(Mantissa area) │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ 5.0 │ 0 │ 100 0000 1 │ 010 0000 ... │
├─────────────┼──────────────────┼────────────────────┼─────────────────────┤
│ -5.5 │ 1 │ 100 0000 1 │ 011 0000 ... │
└─────────────┴──────────────────┴────────────────────┴─────────────────────┘
#include <stdio.h>
int main()
{
float f1 = 5.0;
float f2= -5.5;
printf("%f, 0x%x\n", f1, *(unsigned int*)&f1); // 5.000000, 0x40a00000
printf("%f, 0x%x\n", f2, *(unsigned int*)&f2); // -5.500000, 0xc0b00000
return 0;
}
- E All for 0 when
Take floating point numbers for example .
because E add 127 After all 0, in other words E The real value of is -127, That is, the floating-point index part is2^(-127), This is a very small number , At this point, the significant number M No more first 1, It's reduced to 0 Decimals of integers .
This is to show that 0, And close to 0 A very small number of .
The same with double precision floating point .
#include <stdio.h>
void show_binary(const float f) {
unsigned int num = *(unsigned int*)&f;
printf("%.6f, 0x%X: ", f, num);
const size_t max_size = 8 * sizeof(float);
int i = (int)max_size;
while (0 <= --i) {
printf("%c", ((num >> i) & 0x1) + '0');
if (0 == (i%4)) {
printf(" ");
}
}
printf("\n");
}
int main()
{
float f21 = 5.0f;
float f22= -5.5f;
float f31 = 0.0f;
float f32 = 0.000001f;
show_binary(f21); // 5.000000, 0x40A00000: 0100 0000 1010 0000 0000 0000 0000 0000
show_binary(f22); // -5.500000, 0xC0B00000: 1100 0000 1011 0000 0000 0000 0000 0000
show_binary(f31); // 0.000000, 0x0: 0000 0000 0000 0000 0000 0000 0000 0000
show_binary(f32); // 0.000001, 0x358637BD: 0011 0101 1000 0110 0011 0111 1011 1101
return 0;
}
- E All for 1 when
Take floating point numbers for example .
because E add 127 After all 1, in other words E The real value of is 128, That is, the floating-point exponent part is2^128, It shows that this is a huge number , At this point, it means positive and negative infinity ( The positive and negative are determined by S decision ).
The same with double precision floating point .
Index E The number of bits in a part determines the range of data that can be represented
Occupy 4 Bytes of int The range of types :[-2^31,2^31-1];
Occupy 4 Bytes of float The range of types : It's about [-3.4*10^38,3.4*10^38], namely (-2^128,+2^128);
why int and float All occupy 4 Bytes of memory ,float But than int The scope of expression is much larger ?
Secret
- float The number of specific numbers that can be expressed is the same as int identical
- float There is a discontinuity between representable numbers , There are jumps
- float Just an approximate representation , Cannot be used as an exact number
- Because the memory representation is relatively complex ,float The speed of computing is faster than int A lot slower
Summary
- The memory representation of floating-point type is different from that of integer type
- Floating point type memory representation is more complex
- Floating point types can represent a wider range
- Floating point type is an imprecise type
- Floating point types are slower
边栏推荐
- 对称式加密与非对称式加密的对比
- ActiveMQ -- JDBC code of persistent mechanism
- NFT guide for musicians
- activemq--持久化机制之LevelDB
- Silicon Valley class lesson 11 - official account news and wechat authorization
- Rich text style word image processing
- Solve NPM error: cannot find module 'shelljs‘
- How to use pixi.js to make simple Parkour games
- 28.插槽
- registration status: 204
猜你喜欢

Canvas dynamic picture avatar shaking JS special effect

【Nacos】NacosClient在服务注册时做了什么

Unity ugui interaction (new ideas)
![[deep learning] overview | the latest progress of deep learning](/img/b9/6117862397dcda4d555c819e913c9b.png)
[deep learning] overview | the latest progress of deep learning
![[NPM] the](/img/ae/efccefae0323a1f6a425523e01d2ac.png)
[NPM] the "NPM" item cannot be recognized as the name of cmdlets, functions, script files or runnable programs. Please check the spelling of the name. If the path is included, make sure the path is co

JS pop-up City filtering component matches mobile terminal

Wechat applet obtains the data of ---- onenet and controls the on-board LED of STM32

Composition of the interview must ask items

Ten thousand words long, one word thoroughly! Finally, someone has made business intelligence (BI) clear

redis的五种数据结构原理分析
随机推荐
Do you know these methods of MySQL database optimization?
API健康状态自检
JS small game source code magic tower breakthrough Download
JDBC快速入门
Sticky.js page scrolling div fixed position plug-in
activemq--可持久化机制之JDBC代码
Network principle (2) -- network development
C#语言和SQL Server数据库技术
Guangzhou has carried out in-depth "100 day action" to check the safety of self built commercial houses, and more than 2 million houses have been checked in two months
Redis-哨兵,主从部署详细篇
NFT guide for musicians
神经网络学习(1)前言介绍
[learn rust together] a preliminary understanding of rust package management tool cargo
LabVIEW experiment - temperature detection system (experimental learning version)
『每日一问』简单聊聊JMM/说说对JMM的了解
[stl]list Simulation Implementation
【Nacos】NacosClient在服务注册时做了什么
Uniapp intercepts route jumps through addinterceptor to control whether the page needs to log in
ActiveMQ -- leveldb of persistence mechanism
Nacos启动报错Unable to start web server