当前位置:网站首页>06. talk about the difference and coding between -is and = = again

06. talk about the difference and coding between -is and = = again

2022-06-26 05:52:00 qq_ forty-two million four hundred and seventy-two thousand nin


One 、 is and == The difference between

1、id()

adopt id() We can see a change 量 Represents the value in memory ** Address **

s = 'alex'
print(id(s)) # 2490207085544
s = "alex"
print(id(s)) # 2490207085544

lst = [1, 2, 4]
print(id(lst)) # 2490209898248

lst1 = [1, 2, 4]
print(id(lst1)) # 2490209898056

#  We found characters 串 The data address of is the same ,  and 列 The data address of the table is 不 Same 

tup = (1, 2)
tup1 = (1, 2)
print(id(tup)) # 2490209897608
print(id(tup1)) # 2490209897672
#  The data address of the tuple is also 不 Same 

print(id(" Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha "*1000)) # 2490208702208
print(id(" Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha "*1000)) # 2490208702208

s1 = "00 Ha ha ha "
s2 = "00 Ha ha ha "
print(id(s1)) # 1919663315544
print(id(s2)) # 1919663315544
print(s1 is s2) # True

a1=str("alexalexalex"+"abcdef")
print(id(a1)) # 2490209916080
a2=str("alexalexalex"+"abcdef")
print(id(a2)) # 2490209916080

a1=str("alexalexalex"+"abcdef"*20)
print(id(a1)) # 2251390928920
a2=str("alexalexalex"+"abcdef"*20)
print(id(a2)) # 2251390929104

s1 = "@1 2 "
s2 = "@1 2 "
print(id(s1)) # 2490208103928
print(id(s2)) # 2490208103928
#  Consistent result ,  But it is inconsistent in the terminal .  So in python in , command 行 Code and py The code in the file runs 行 The effect may be different 

Small data pools ( often 量 pool )
Store the values we used in a small data pool , For other changes 量 Use , Small data pools give numbers and characters 串 Use , Other data types do not exist .

For numbers : -5~256 Will be added to the small data pool , The same object is used every time .

For characters 串:

  1. If it is a plain text message and underline , Then this object will be added to the small data pool
  2. If it is with special characters , Then it will not be added to the small data pool , Every time it's new
  3. If it's a single letter *n The situation of . ‘a’*20, stay 20 It's OK to work in units . exceed 20 Units will not be added to the small data pool

Be careful ( In general ): stay py In file , If you simply define a character 串. In general, it will be added to the small data pool , We can think of it like this : Using characters 串 When ,python Will help us put the characters 串 Into the 行 cache , Point directly to this character the next time you use it 串 that will do , It can save a lot of memory .

Don't get tangled up in this problem , Because the official did not give a perfect conclusion and conclusion , So I can only grope by myself , The following is excerpted from the official website about id() Description of :

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: CPython implementation detail: This is the address of the object in memory.

say 了 So much , This id() and is Yes 什 It doesn't matter ?

Be careful : is The comparison is id() The result of the calculation , because id Is to help us check some data ( object ) Memory address of . that is The comparison is the data ( object ) Memory address of .

We finally passed is You can view two variables 量 Whether the same object is used

== Equality means to judge whether they are equal , Be careful : This double comparison is a specific value , Instead of memory address

s1 = " ha-ha "
s2 = " ha-ha "
print(s1 == s2) # True
print(s1 is s2) # True  The reason is that there are small data pools   Lead to two changes 量 It's pointing to the same object 
l1 = [1, 2, 3]
l2 = [1, 2, 3]
print(l1 == l2) # True,  The value is the same 
print(l1 is l2) # False,  The value is false 

summary :
  is The comparison is Address
  == The comparison is value



Two 、 Code supplement

  1. python2 The default is ASCII code , So Chinese is not supported , If you need to Python2 in 更 Change code , It needs to be written at the beginning of the document :
# -*- encoding:utf-8 -*-
  1. python3 in : The memory uses unicode code

1、 Coding review :

  1. ASCII : The earliest code . There are English capital letters in it , Lowercase letters , Numbers , Some special characters . There is no Chinese ,8 individual 01 Code , 8 individual bit, 1 individual byte
  2. GBK: Chinese national standard code , 里 Bread contains 了ASCII Coding and common Chinese coding , 16 individual bit, 2 individual byte
  3. UNICODE: unicode , 里 Bread contains 了 The code of all the countries in the world ,32 individual bit, 4 individual byte, contain 了ASCII
  4. UTF-8: Variable length universal code . yes unicode An implementation of . The minimum characters occupy 8 position
    • english : 8bit 1byte
    • European writing :16bit 2byte
    • chinese :24bit 3byte

Sum up , except 了ASCII Out of yards , Other information cannot be directly converted

stay python3 The memory of the , In the process of transportation 行 Stage , It uses unicode code , because unicode It's the universal code ,什 Any content can be entered 行 Show , When data is transferred and stored unicode It's a waste of space and resources , Need to put unicode Transfer to deposit UTF-8 perhaps GBK Into the 行 Storage , How to switch ?

stay python Text messages can be put into 行 code , After coding, you can enter 行 transmission 了, The data after coding is bytes Data of type . In fact! , Or is it that the original data is only encoded and then expressed 了 Just change .

2、bytes Form of expression :

  • english b’alex’ English expressions and characters 串 no 什 So different
  • chinese b’\xe4\xb8\xad’ This is a Chinese character UTF-8 Of bytes form

character 串 When transmitted, it is converted to bytes=> encode( Character set ) To complete

s = "alex"
print(s.encode("utf-8")) #  The character 串 Code as UTF-8
print(s.encode("GBK")) #  The character 串 Code as GBK
 result :
b'alex'
b'alex'

s = " in "
print(s.encode("UTF-8")) #  The Chinese code is UTF-8
print(s.encode("GBK")) #  The Chinese code is GBK

 result :
b'\xe4\xb8\xad'
b'\xd6\xd0

remember : Results and source characters after English encoding 串 Agreement , The result of Chinese coding depends on the coding , The coding results are different , We can see that , A Chinese UTF-8 Encoding is 3 Bytes , One GBK Is the Chinese code of 2 Bytes , The type after coding is bytes type , During network transmission and storage, we python It's stored and stored bytes type , So when the other party receives , It's also received bytes Data of type , We can Use decode() Come on in 行 Decoding operation , hold bytes Type of data is restored back to the familiar characters 串:

s = " My name is 李 Cazenove "
print(s.encode("utf-8")) # b'\xe6\x88\x91\xe5\x8f\xab\xe6\x9d\x8e\xe5\x98\x89\xe8\xaf\x9a'
print(b'\xe6\x88\x91\xe5\x8f\xab\xe6\x9d\x8e\xe5\x98\x89\xe8\xaf\x9a'.decode("utf-8")) #  decode 
#  My name is Li Jiacheng 

When encoding and decoding, we need to make the encoding format

s = " I am character. "
bs = s.encode("GBK") #  We can get GBK Words of 
print(bs)

#  hold GBK convert to UTF-8
#  First of all, put GBK convert to unicode, That is to say, decoding is needed 
s = bs.decode("GBK") #  decode 
print(s)

#  Then I need to go into 行 Recode into UTF-8
bss = s.encode("UTF-8") #  Recode 
print(bss) # b'\xe6\x88\x91\xe6\x98\xaf\xe6\x96\x87\xe5\xad\x97'
原网站

版权声明
本文为[qq_ forty-two million four hundred and seventy-two thousand nin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180504438306.html