当前位置:网站首页>Lesson 3 urllib
Lesson 3 urllib
2022-06-25 20:43:00 【Osmanthus rice wine balls】
The third class urllib
One 、 Encapsulate the source code in the web page into an object
import urllib.request
# Get one get request
response = urllib.request.urlopen("http://www.baidu.com") # Packaged in response in
print(response.read().decode('utf-8')) #decode('utf-8') Decode the obtained web page code , To prevent the occurrence of Chinese characters , Print out the web source code
# Get one post request ( Used to simulate login ( password , user ))
use httpbin.org
import urllib.parse # Parser , Parsing key value pairs
data = bytes(urllib.parse.urlencode({
"hello":"world"}),encoding = "utf-8")# Forms , Package that encapsulates key value pair information into binary ,encoding = "utf-8" Encapsulation
response = urllib.request.urlopen("http://httpbin.org/post",data = data)
print(response.read().decode('utf-8'))
Two 、 Timeout problem
try:
response = urllib.request.urlopen("http://httpbin.org/post",timeout=0.01)# For more than 0.01 second
print(response.read().decode('utf-8'))
except urllib.error.URLError as e:
print("time out!")
3、 ... and 、 Response header questions ( Pretend to be a browser )
url = "https://httpbin.org/post"
headers = {
"User-Agent":"……"}
data = bytes(urllib.parse.urlencode({
"hello":"world"}),encoding = "utf-8")
req = urllib.request.Request(url=url,data=data,headers=headers,method='post')# encapsulation , A browser that simulates reality
response = urllib.request.urlopen(req)# encapsulation
print(response.read().decode("utf-8"))
look for User-Agent Methods ( look for headers The key/value pair ):
Find in the network
[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-yaNApJ6z-1644636635823)(C:\Users\ litchi \AppData\Roaming\Typora\typora-user-images\image-20220204161745986.png)]
Four 、 get data
# Crawl to the web
def getData(baseurl):
dataist = []
for i in range(0,10):# Call the function to get page information ,10 Time
url = baseurl + str(i*25)
html = askURL(url)# Save the source code of the web page
return datalist
# Get the designated one URL The web content of
def askURL(url):
head = {
"User-Agent":"……"
}# To disguise , Simulate browser header information
request = urllib.request.Request(url,headers=head)# carry headers To visit url
try:
response = urllib.request.urlopen(request)# Get information about the entire web page
html = response.read().decode("utf-8")# Read information ( Web source code )
except urllib.error.URLError as e:# Capture the error
if hasattr(e,"code"):
print(e.code)# Print code, See what's wrong with the coding
if hasattr(e,"reason"):
print(e.reason)# Print out the reasons for the failure
return html
r(e,“reason”):
print(e.reason)# Print out the reasons for the failure
return html
边栏推荐
- 2020-11-14-Alexnet
- COMP9024
- I Space distributor
- Introduction to interface test, interface protocol and common interface test tools
- Redis core principle and design idea
- Node connection MySQL
- Leetcode daily question - 27 Remove element (simple)
- 2021-08-25
- Log4j2 vulnerability battle case
- Leetcode daily [2022 - 02 - 17]
猜你喜欢

Bank digital transformation layout in the beginning of the year, 6 challenges faced by financial level structure and Countermeasures
Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)

Slenium tips: how to handle some dialog boxes that may appear on Web pages
Live broadcast preview | front line experts invite you to talk: the value of data science enabling multiple application scenarios

Redis core principle and design idea

Clickhouse disables automatic clearing of tables / columns, that is, disables TTL
hashlib. Md5() function to filter out duplicate system files and remove them
How does zhiting home cloud and home assistant access homekit respectively? What is the difference between them?
8 minutes to understand the wal mechanism of tdengine

Lantern Festival, learning at the right time! Novice training camp attacks again, learning buff continues to fill up
随机推荐
very good
Introduction to the basics of kotlin language: lambda expression
hashlib. Md5() function to filter out duplicate system files and remove them
[untitled]
Leetcode daily [2022 - 02 - 17]
Swin UNET reading notes
laf. JS - open source cloud development framework (readme.md)
Great changes in the interaction between people and the digital world
The live registration is hot to start | the first show of Apache dolphin scheduler meetup in 2022!
An unusual interview question: why doesn't the database connection pool adopt IO multiplexing?
Day 28/100 CI CD basic introductory concepts
How to view and explain robots protocol
How can the intelligent transformation path of manufacturing enterprises be broken due to talent shortage and high cost?
Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
"Space guard soldier" based on propeller -- geosynchronous geostationary orbit space target detection system
How to buy the millions of medical insurance for children? How much is it a year? Which product is the best?
MySQL lock
Reasons for network timeout app flash back
II Traits (extractors)
Ensure the decentralization and availability of Oracle network