当前位置:网站首页>For Xiaobai who just learned to crawl, you can understand it after reading it
For Xiaobai who just learned to crawl, you can understand it after reading it
2022-06-23 01:57:00 【XUchenmp】
Preface
I wrote the library reservation script and found that it attracted a lot of Xiaobai , Then I found that they had learned to crawl , But you may not know the nature and principle of reptiles , You may feel that you have understood the video or tutorial , Then I changed a website and found that I knew the one in the tutorial . Because it is not the implementation of the code that is difficult to write a crawler , How hard it can be to write some little crawler code ? Not just Baidu . The main difficulty is packet analysis , After reading this article, you can basically crawl all the small websites .
The essence of reptiles
We click... In the browser 、 Operations such as input can be simplified to the following figure .
The following is a simplified login flowchart .
This is a simplified diagram , But you can tell , What users do on the browser is essentially the browser sending packets to the server 、 Receive packets .
Whether it's a reptile ( for example : Crawling novels ) Or automated scripts ( for example : Grab seats in the library ) The essence of is to send packets 、 Receive packets , Pictured :
You can see , To log in, we only need to send a key packet , You don't need so many useless packets . Of course this is a simplified version , There are some websites that can send more than one packet when logging in , You need to grab your own bags 、 The contract tests analyzed .
Seeing this, some people may have been unable to bear the excitement to find a website to try , Don't worry , Otherwise, it would be a bit awkward to come back later .
Tips for analyzing packets
If you think that opening a web page or doing an operation is sending a packet, it's naive . Don't believe it f12 Then open Baidu , The data package makes your scalp numb .
52 A request , Just , Very ignorant .
But this is a big website , So much is normal , There are not so many small websites in general . So how do you find out what you want most from multiple packets .
1. According to the requested path 、 Name, etc
For example, you need to find the key data package for login , You can find the corresponding keyword for login . for instance login、logincheck What? , There may be some low-level websites that can't be identified denglu In this way . This depends on personal experience , You can see everything if you climb too much .
2. According to the type
Take Baidu as an example , If you are looking for a login package ( I opened the home page , Just for example ), Like these js、gif、css、png、 And I don't know plain Basically useless .
After all, log in and you take care of those pictures 、gif What are you doing , If there is a picture verification code, just select the element to find it .
Conclusion
Seeing this can basically help Xiaobai further understand reptiles , It's easy to analyze some simple packages . Don't tell me that you won't write a program to simulate the contract after analyzing the package , See if I can beat you or not .
If you are crawling data, you have to xpath Equal resolution 、 If the package is complex and cannot be analyzed, you can use selenium. These on their own Baidu , I am too lazy to write on the Internet .
If you think it's useful, give it a compliment .
边栏推荐
- //1.8 char character variable assignment integer
- Triangle judgment (right angle, equilateral, general)
- Centos7 installing postgresql12
- Network module packaging
- Do you know the memory components of MySQL InnoDB?
- 9. class and object practice and initialization list
- Debian10 configuring rsyslog+loganalyzer log server
- [luogu] p1083 [noip2012 improvement group] borrow classroom (line segment tree)
- C. Unstable String
- "First knowledge of C language" (Part 3)
猜你喜欢

C language games: sanziqi (simple version) implementation explanation

Use elk to save syslog, NetFlow logs and audit network interface traffic

Cmake simple usage

Foundation Consolidation - Flex width is content width

Dynamic address book in C language (add, delete, modify, check (duplicate), delete, sort and export)

Anaconda creates a new environment encounter pit

JS rotation chart (Netease cloud rotation chart)

Questions not written in the monthly contest

2D prefix and

1. Mx6u startup mode and equipment
随机推荐
Questions not written in the monthly contest
Rebirth -- millimeter wave radar and some things I have to say
Uint8 serializing and deserializing pits using stringstream
[hdu] P7079 Pty loves lines
JS prototype and prototype chain Paramecium can understand
You can be what you want to be
[CodeWars]Matrix Determinant
Network module packaging
Using mock data in vite projects -vite plugin mock
Knowledge point learning
11. function template, class template, implementation vector sequence container, and space adapter
Why can't I access object properties in a for in loop in an object array
1. Mx6u bare metal program (5) - external interrupt
"First knowledge of C language" (Part 3)
Detailed explanation of GCC usage
278. digital combination
Day260: the number III that appears only once
[hdu] p1466 calculate the number of intersections of straight lines
Analysis of current mainstream video coding technology | community essay solicitation
Initial structure