当前位置:网站首页>Basic knowledge of scratch crawler framework
Basic knowledge of scratch crawler framework
2022-07-25 04:43:00 【HHYZBC】
Catalog
scrapy The role of the framework
scrapy Three built-in objects of
scrapy The specific function of each module in
scrapy The concept of
Scrapy It's a Python Write the open source web crawler framework . It is designed to crawl network data 、 Framework for extracting structural data .
scrapy The role of the framework
A small amount of code , You can quickly grab
scrapy The process of

The process can be described as follows :
- The first of the reptiles url Constructed as request object --> Crawler middleware --> engine --> Scheduler
- Dispatcher request--> engine --> Download Middleware ---> Downloader
- Downloader sends request , obtain response Respond to ----> Download Middleware ----> engine ---> Crawler middleware ---> Reptiles
- Reptile extract url Address , Assemble into request object ----> Crawler middleware ---> engine ---> Scheduler , Repeat step 2
- Crawler extraction data ---> engine ---> Pipeline processing and data storage
Be careful :
- The green line in the figure represents the transfer of data
- Note the location of the middleware in the figure , Determined its function
- Pay attention to the location of the engine , All modules are previously independent of each other , Only interact with the engine
scrapy Three built-in objects of
- request Request object : from url method post_data headers Etc
- response The response object : from url body status headers Etc
- item Data objects : The essence is a dictionary
scrapy The specific function of each module in

Be careful :
- Crawler middleware and download middleware are only running in different logical locations , The effect is repetitive : Replace UA etc.
边栏推荐
- MCU experiment record
- Opencv4.5.x+cuda11.0.x source code compilation and yolov5 acceleration tutorial!
- Grafana visual configuration diagram histogram
- Metinfo function public function getcity() error: XXX function no permission load!!!
- Salt and ice particles cannot be distinguished
- Dig deep into data dividends, Intel and industry accelerate the implementation of digital economy
- [sht30 temperature and humidity display based on STM32F103]
- Introduction to computing system hardware (common servers)
- Detailed explanation of security authentication of mongodb
- [cloud picture theory] 247 first introduction to Huawei cloud analysis service
猜你喜欢

Grafana visual configuration diagram histogram

Definition and basic terms of tree
![[sht30 temperature and humidity display based on STM32F103]](/img/43/bbc66ab2d56cfa9dc05d795e8fe456.jpg)
[sht30 temperature and humidity display based on STM32F103]

5年经验的大厂测试/开发程序员,怎样突破技术瓶颈?大厂通病......

Druid连接池——从0开始坚强的一点点的自学,Druid一点不懂的可以点进来,懂得别点进来,点进来你会嫌我啰嗦的

暗黑王者|ZEGO 低照度图像增强技术解析

盐粒和冰粒分不清

Network engineering case: integrated network design of CII company

Ffmpeg download and installation

The interviewer asked MySQL transactions, locks and mvcc at one go. I
随机推荐
Paper:《Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Condi
一般在进行数仓迁移过程中,是如何进行数据测试的?
The application could not be installed: INSTALL_ FAILED_ USER_ RESTRICTED
Docker install MySQL 5.7
Grafana visual configuration diagram histogram
Maker concept design to adapt to popular education
It we media shows off its wealth in a high profile, and is targeted by hacker organizations. It is bound to be imprisoned
What tools are available to connect and manage polardb for PostgreSQL databases?
Sudden! Britain accuses Huawei of major defects in its equipment (with report)
Database design process
mitt.js:小型事件发布订阅库
Summary of UPR optimization suggestions of unity
In the process of data migration from Oracle to polardb for PostgreSQL, what does data migration mean?
Zhongchuang computing power won the recognition of "2022 technology-based small and medium-sized enterprises"
Only list the data of the specified field GetData ($table, '*', $where, $order)
How to merge cells in a table by markdown
HTC new VR all-in-one machine vive focus plus release: price 5699 yuan!
Token value replacement of burpsuite blasting
LVGL Switch & Table
LVGL 8.2 Slider