当前位置:网站首页>Getting started with scrapy
Getting started with scrapy
2022-06-24 12:25:00 【HLee】
brief introduction
Scrapy It's a way of crawling website data , Application framework for extracting structural data . Can be applied to include data mining , In a series of programs that process or store historical data . It was originally designed to Page grabbing ( More specifically , Network capture ) Designed by , It can also be used to get API Data returned ( for example Amazon Associates Web Services ) Or general purpose web crawlers .
- Scrapy yes Python A fast development 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing .
- Scrapy The attraction is that it's a framework , Anyone can modify it conveniently according to their needs . It also provides a base class for many types of reptiles , Such as BaseSpider、sitemap Reptiles, etc , The latest version offers web2.0 Reptile support .
Scrapy
Scrapy install
pip install Scrapy
Scrapy The new project
scrapy startproject scrapyspider remarks : New project name -scrapyspider
Will generate Scrapy project , The name of the project is scrapyspider , The structure is as follows : Major rewriting 2 File :“items、settings”, newly added 2 File :“ Crawler main program ”、itemcsvexporter.
scrapyspider scrapy.cfg # Automatically generate when creating a project , The configuration file for the project scrapyspider/ __init__.py # Automatically generate when creating a project , No changes required items.py # Automatically generate when creating a project , Define the crawled fields pipelines.py # Automatically generate when creating a project , If stored in documents , No changes required settings.py # Automatically generate when creating a project , Output the crawling fields in order middlewares.py # Automatically generate when creating a project , No changes required spiders/ __init__.py # Automatically generate when creating a project , No changes required itemcsvexporter.py # You need to write it yourself , The code is fixed Crawler main program .py # You need to write it yourself , The main program of the crawler
边栏推荐
- FreeRTOS overview and experience
- What should music website SEO do?
- Basic path test of software test on the function of the previous day
- ArrayList#subList这四个坑,一不小心就中招
- 深度学习~11+高分疾病相关miRNA研究新视角
- 我在深圳,到哪里开户比较好?现在网上开户安全么?
- Deep learning ~11+ a new perspective on disease-related miRNA research
- National standard platform easygbs administrator assigns roles to sub users and troubleshooting of invalid channels
- Use go to process millions of requests per minute
- Embedded must learn! Detailed explanation of hardware resource interface - based on arm am335x development board (Part 2)
猜你喜欢
FreeRTOS overview and experience
如何优雅的写 Controller 层代码?
[go language questions] go from 0 to entry 4: advanced usage of slice, elementary review and introduction to map
How stupid of me to hire a bunch of programmers who can only "Google"!
Linker --- linker
How can a shell script (.Sh file) not automatically close or flash back after execution?
【直播回顾】战码先锋第七期:三方应用开发者如何为开源做贡献
How is the e-commerce red envelope realized? For interview (typical high concurrency)
Tools and methods - use code formatting tools in source insight
【数字IC/FPGA】Booth乘法器
随机推荐
Concentrate on research preparation, Tencent cloud, see you next year!
Database migration tool flyway vs liquibase (II)
Axi low power interface
保险APP适老化服务评测分析2022第06期
Is it safe to open an account under the conditions of new bonds
Group planning - General Review
Clickhouse uses distributed join of pose series
Basic path test of software test on the function of the previous day
What should music website SEO do?
Embedded must learn! Detailed explanation of hardware resource interface - based on arm am335x development board (Part 2)
How to evaluate software development projects reasonably?
怎么可以打新债 开户是安全的吗
【老卫搞机】090期:键盘?主机?全功能键盘主机!
如何优雅的写 Controller 层代码?
ArrayList # sublist these four holes, you get caught accidentally
Continuous testing | test process improvement: practice continuous testing within iterations in coding
Linker --- linker
Chenglixin research group of Shenzhen People's hospital proposed a new method of multi group data in the diagnosis and prognosis analysis of hepatocellular carcinoma megps
【直播回顾】战码先锋第七期:三方应用开发者如何为开源做贡献
mRNA疫苗的研制怎么做?27+ 胰腺癌抗原和免疫亚型的解析来告诉你答案!