当前位置:网站首页>scrapy-redis写项目备忘
scrapy-redis写项目备忘
2022-07-24 11:37:00 【范之度】
爬虫文件:
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisCrawlSpider
class MyCrawler(RedisCrawlSpider):
name = 'mycrawler_redis'
redis_key = 'mycrawler:start_urls'
#规则
rules = (
# follow all links
Rule(LinkExtractor(), callback='parse_page', follow=True),
)
#重点是allowed_domains后边弄成数组list
def __init__(self, *args, **kwargs):
# Dynamically define the allowed domains list.
domain = kwargs.pop('domain', '')
self.allowed_domains =list(filter(None, domain.split(',')))
super(MyCrawler, self).__init__(*args, **kwargs)
def parse_page(self, response):
return {
'name': response.css('title::text').extract_first(),
'url': response.url,
}在setting文件中加入如下代码:
REDIS_URL = 'redis://root:@127.0.0.1:6379'
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
DUPEFILTER_DEBUG =True
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
SCHEDULER_PERSIST = True
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'边栏推荐
- Exceptions about configuring Postgres parameters
- Video playback | how to become an excellent reviewer of international journals in the field of Geoscience and ecology?
- Shell script "< < EOF" my purpose and problems
- Import the data in MariaDB into columnstore
- [golang] golang implements the post request to send form type data function
- Hash - 242. valid alphabetic ectopic words
- cgo+gSoap+onvif学习总结:9、go和c进行socket通信进行onvif协议处理
- Hash - 15. Sum of three numbers
- Blue Bridge Cup - binary conversion exercise
- Pytorch learning -- using gradient descent method to realize univariate linear regression
猜你喜欢

Is there any charge for PDF processing? impossible!
![08 [AIO programming]](/img/a6/156cb97e653190c76f22c88b758fef.png)
08 [AIO programming]

How to go from functional testing to automated testing?

Video playback | how to become an excellent reviewer of international journals in the field of Geoscience and ecology?

cgo+gSoap+onvif学习总结:9、go和c进行socket通信进行onvif协议处理

Ask n! How many zeros are there behind
](/img/fd/e12f43e23e6ec76c2b44ce7813e204.png)
运算放大器 —— 快速复苏笔记[贰](应用篇)

PDF处理还收费?不可能!

2022,软测人的平均薪资,看完我瞬间凉了...

Sorting out the ideas of data processing received by TCP server, and the note of select: invalid argument error
随机推荐
Easy to use example
Linked list - Sword finger offer interview question 02.07. linked list intersection
MOS管 —— 快速复苏应用笔记(壹)[原理篇]
Why can't memset initialize array elements to 1?
MySQL creates partition tables and automatically partitions them by day
HCIP OSPF接口网络类型实验 第四天
基于NoCode构建简历编辑器
Two important laws about parallelism
Hash - 202. Happy number
[golang] golang realizes sending wechat service number template messages
Install MariaDB columnstore (version 10.3)
CSDN会员的魅力何在?我要他有什么用?
哈希——202. 快乐数
Blue Bridge Cup provincial match training camp - Calculation of date
Literature record (part109) -- self representation based unsupervised exemplar selection in a union of subspaces
Talk about software testing - automated testing framework
Svn server and client installation (Chinese package) and simple use
2022,软测人的平均薪资,看完我瞬间凉了...
SSH跨平台终端工具tabby推荐
Text message verification of web crawler