当前位置:网站首页>Reptile lesson 1
Reptile lesson 1
2022-06-23 01:59:00 【A beginner in Python】
Introduction to reptiles
From today on, let's learn something about reptiles ! This is also the first time I have written a blog on this platform to record the learning process . If you have any mistakes, please give us more advice ! Now let's start together !!!
( One ) What is a reptile
1.1 The concept of reptile
Crawlers are used to crawl data , Also called data acquisition program .
The data of the crawler comes from the network , And the network data can have web The server , Database server, cloud storage, etc .
notes : Of course, it is legal to use crawlers to crawl data , For example, the data you want to crawl must be public and non-profit .
1.2. python The reptiles of
Use python Written crawler script ( Program ) Can complete timing , ration , Designated target (web Site ) Data crawling . It mainly uses many ( single ) Threads / process , Network request Library , Data analysis , data storage , Task scheduling and other related technologies .
python The crawler engineer can complete the interface test , Functional testing and integration testing .
( Two ) Reptiles and web The relationship between back-end services
The crawler uses the web request Library , Equivalent to client request ,web The back-end server responds to the data according to the request .( Here's the picture )
The reptile is going to web The server initiates HTTP request , Accept the response data correctly , Then, according to the type of data (Content-Type) To parse and save data .
The crawler needs to forge the browser before sending the request (User-Agent Specify the request header ), Then make a request to the server .
( 3、 ... and )Python Related libraries of crawler technology
Network request :
- urllib
- requests
- selenium(UI Automatic test , dynamic js Rendering )
- appium( mobile phone app A reptile or UI test )
Data analysis :
- re Regular
- xpath
- bs4
- json
data storage :
- pymysql
- mongodb
- elasticsearch
Multitask Library :
- Multithreading (threading)/ The thread queue (queue)
- coroutines (asynio,gevent/eventlet)
The crawler frame :
- scrapy
- scrapy-redis Distributed ( Multi machine crawler )
( Four ) Common anti - Crawler strategies
- UA(User-Agent) Strategy
- Login restrictions (cookie) Strategy
- Frequency of requests (IP agent ) Strategy
- Verification Code ( picture - Cloud code , Text or object selection , Slider, etc ) Strategy
- dynamic js(selenium/splash/api Interface ) Strategy
( 5、 ... and ) The reptile library urllib
You can visit this website to see the detailed urllib Use of Library ! Later, I will update the specific usage of this library for you .
https://www.runoob.com/python3/python-urllib.html
So that's the end of today's learning ! See you next time !
边栏推荐
- Debian10 LVM logical volumes
- Centos7 installing postgresql12
- Install MySQL (5.7+8.0) through docker and configure master-slave replication (gtid+ enhanced semi synchronization)
- Zabbix5 series - use temperature and humidity sensor to monitor the temperature and humidity of the machine room (XX)
- "First knowledge of C language" (Part 3)
- Network module packaging
- JS rotation chart (Netease cloud rotation chart)
- Up the Strip
- An interesting example of relaxed memory models
- Branch and loop statements (including goto statements) -part2
猜你喜欢

2D prefix and

Nuxt - auto generate dynamic route bug

Module 8 job

Classical questions of function recursion

There are animation characters interacting with each other when the mouse slides in the web page

3. compilation and linking principle

Cut! 39 year old Ali P9 saved 150million

Network module packaging

2022-1-14

II Data preprocessing
随机推荐
[hdu] P7079 Pty loves lines
C. Unstable String
10. static member variables, static member methods, and pointers to class members
278. digital combination
Google account cannot be logged in & external links cannot be opened automatically & words with words cannot be used
1. introduction to MySQL database connection pool function technology points
Campus network AC authentication failed
CSDN browser assistant for online translation, calculation, learning and removal of all advertisements
[cmake command notes]find_ path
1. Mx6u bare metal program (1) - Lighting master
Detailed explanation of clip attribute parameters
Freshman C language summary post (hold change) Part 2 formatted monthly calendar
what the fuck! If you can't grab it, write it yourself. Use code to realize a Bing Dwen Dwen. It's so beautiful ~!
6. const usage, combination of first and second level pointers
Using mock data in vite projects -vite plugin mock
1. Mx6u image burning principle (no specific process)
Unique in Pimpl_ PTR compilation errors and Solutions
On AI and its future trend | community essay solicitation
7.new, delete, OOP, this pointer
JS case: support canvas electronic signature function on PC and mobile