当前位置:网站首页>Scrapy framework (I): basic use
Scrapy framework (I): basic use
2022-06-27 15:34:00 【User 8336546】
Preface
This article briefly introduces Scrapy The basic use of the framework , And some problems and solutions encountered in the process of use .
Scrapy Basic use of framework
Installation of environment
1. Enter the following instructions to install wheel
pip install wheel
2. download twisted
Here is a download link :http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted
notes : There are two points to note when downloading :
- To download with yourself python File corresponding to version ,
cpxx
Version number .( For example, I python Version is 3.8.2, Just download cp38 The file of ) - Download the corresponding file according to the number of bits of the operating system .32 Bit operating system download
win32
;64 Bit operating system download win_amd64.
3. install twisted
Download the good in the previous step twisted Enter the following command in the directory of :
pip install Twisted-20.3.0-cp38-cp38-win_amd64.whl
4. Enter the following instructions to install pywin32
pip install pywin32
5. Enter the following instructions to install scrapy
pip install scrapy
6. test
Input in the terminal scrapy
command , If no error is reported, the installation is successful .
establish scrapy engineering
Here it is. PyCharm Created in the scrapy engineering
1. open Terminal
panel , Enter the following instructions to create a scrapy engineering
scrapy startproject ProjectName
ProjectName
Is the project name , Define your own .
2. The following directories are automatically generated
3. Create a crawler file
First, enter the newly created project directory :
cd ProjectName
And then in spiders Create a crawler file in a subdirectory
scrapy genspider spiderName www.xxx.com
spiderName
Is the name of the crawler file , Define your own .
4. Execute the project
scrapy crawl spiderName
Modification of file parameters
In order to better implement the crawler project , Some file parameters need to be modified .
1.spiderName.py
The contents of the crawler file are as follows :
import scrapy class FirstSpider(scrapy.Spider): # The name of the crawler file : Is a unique identifier of the crawler source file name = 'spiderName' # Allowed domain names : Used to define start_urls Which... In the list url You can send requests allowed_domains = ['www.baidu.com'] # Initial url list : The... Stored in this list url Will be scrapy Send the request automatically start_urls = ['http://www.baidu.com/','https://www.douban.com'] # For data analysis :response The parameter represents the corresponding response object after the request is successful def parse(self, response): pass
notes :
allowed_domains
The list is used to limit the requested url. In general, there is no need for , Just comment it out .
2.settings.py
1). ROBOTSTXT_OBEY
find ROBOTSTXT_OBEY
keyword , The default parameter here is Ture.( That is, the project complies with by default robots agreement
) Practice for the project , It can be temporarily changed to False
.
# Obey robots.txt rules ROBOTSTXT_OBEY = False
2). USER_AGENT
find USER_AGENT
keyword , The default comment here is . Modify its contents , To avoid UA Anti creeping .
# Crawl responsibly by identifying yourself (and your website) on the user-agent USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50'
3). LOG_LEVEL
In order to view the project operation results more clearly ( The default running result of the project will print a large amount of log information ), Can be added manually LOG_LEVEL
keyword .
# Displays log information of the specified type LOG_LEVEL = 'ERROR' # Show only error messages
Possible problems
1. Installation completed successfully scrapy
, But after the crawler file is created, it still shows import scrapy
Mistake .
The environment in which I practice is based on Python3.8 Various virtual environments created , However, in building scrapy Project time pip install scrapy
Always report an error .
Initially, it was manually posted on the official website :https://scrapy.org/ download scrapy library , Then install to the virtual environment site-packages
Under the table of contents , Sure enough, looking back import scrapy
It's normal , Programs can also run . But still print a lot of error messages , adopt PyCharm Of Python Interpreter
Check to see if Scrapy
Library included .
But I tried some solutions , To no avail …
Finally found Anaconda
Bring their own Scrapy
library , So it is based on Anaconda
Created a virtual environment , Perfect operation ~~~~
ending
study hard
边栏推荐
- Numerical extension of 27es6
- substrate 技术每周速览 20220411
- Go error collection | when a function uses a return value with a parameter name
- What is the London Silver unit
- [kotlin] the next day
- Pycharm安装与设置
- Pisa-Proxy 之 SQL 解析实践
- What is the London Silver code
- E ModuleNotFoundError: No module named ‘psycopg2‘(已解决)
- Luogu_ P1002 [noip2002 popularization group] crossing the river_ dp
猜你喜欢
Luogu_ P1002 [noip2002 popularization group] crossing the river_ dp
Talk about redis transactions
Introduction to TTCAN brick moving
Web chat room system based on SSM
CNN convolutional neural network (the easiest to understand version in History)
ICML 2022 | 阿⾥达摩院最新FEDformer,⻓程时序预测全⾯超越SOTA
ReentrantLock、ReentrantReadWriteLock、StampedLock
Interview question: rendering 100000 data solutions
直播app运营模式有哪几种,我们该选择什么样的模式?
Beginner level Luogu 1 [sequence structure] problem list solution
随机推荐
数学建模经验分享:国赛美赛对比/选题参考/常用技巧
Indexeddb learning materials
[interview questions] common interview questions (I)
How to change a matrix into a triple in R language (i.e. three columns: row, col, value)
Programming skills: script scheduling
Numerical extension of 27es6
ThreadLocal之强、弱、軟、虛引用
16 -- 删除无效的括号
R language error
Gin general logging Middleware
手机号码的格式
HTTP Caching Protocol practice
专家:让你低分上好校的都是诈骗
Elegant custom ThreadPoolExecutor thread pool
关于快速幂
Design of CAN bus controller based on FPGA (with main codes)
机械硬盘和ssd固态硬盘的原理对比分析
Eolink launched a support program for small and medium-sized enterprises and start-ups to empower enterprises!
AI begets the moon, and thousands of miles share the literary heart
PSS:你距離NMS-free+提點只有兩個卷積層 | 2021論文