当前位置:网站首页>Script redis write project notes
Script redis write project notes
2022-07-24 11:42:00 【Fan zhidu】
Crawler file :
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisCrawlSpider
class MyCrawler(RedisCrawlSpider):
name = 'mycrawler_redis'
redis_key = 'mycrawler:start_urls'
# The rules
rules = (
# follow all links
Rule(LinkExtractor(), callback='parse_page', follow=True),
)
# The key is allowed_domains Make the back into an array list
def __init__(self, *args, **kwargs):
# Dynamically define the allowed domains list.
domain = kwargs.pop('domain', '')
self.allowed_domains =list(filter(None, domain.split(',')))
super(MyCrawler, self).__init__(*args, **kwargs)
def parse_page(self, response):
return {
'name': response.css('title::text').extract_first(),
'url': response.url,
}stay setting Add the following code to the file :
REDIS_URL = 'redis://root:@127.0.0.1:6379'
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
DUPEFILTER_DEBUG =True
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
SCHEDULER_PERSIST = True
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'边栏推荐
- Directional crawling Taobao product name and price (teacher Songtian)
- 6k+ star,面向小白的深度学习代码库!一行代码实现所有Attention机制!
- 安装jmeter
- Paging query of employee information of black maredge takeout
- cgo+gSoap+onvif学习总结:9、go和c进行socket通信进行onvif协议处理
- Differences between JS map and foreach
- Performance test summary (I) -- basic theory
- Video playback | how to become an excellent reviewer of international journals in the field of Geoscience and ecology?
- Chapter 0 Introduction and environment configuration
- [deserialization vulnerability-02] principle test and magic method summary of PHP deserialization vulnerability
猜你喜欢

One week's wonderful content sharing (issue 13)
![Operational amplifier - Notes on rapid recovery [II] (application)](/img/fd/e12f43e23e6ec76c2b44ce7813e204.png)
Operational amplifier - Notes on rapid recovery [II] (application)

生信周刊第37期

Sentinel vs Hystrix 限流对比,到底怎么选?
![[deserialization vulnerability-01] Introduction to serialization and deserialization](/img/e4/6b9ee6ee74f3cdc3c886ed3af9ef73.png)
[deserialization vulnerability-01] Introduction to serialization and deserialization

Hcip OSPF interface network type experiment day 4

Robot framework official tutorial (I) getting started

Record a garbage collection and analysis of gceasy
![08 [AIO programming]](/img/a6/156cb97e653190c76f22c88b758fef.png)
08 [AIO programming]

Is there any charge for PDF processing? impossible!
随机推荐
Easy to understand ES6 (IV): template string
Recommended SSH cross platform terminal tool tabby
Chapter 0 Introduction and environment configuration
What is the difference between strong reference, soft reference, weak reference and virtual reference?
makefile快速使用
Three small knowledge points about data product managers
String - 541. Reverse string II
Grep actually uses ps/netstat/sort
[QNX hypervisor 2.2 user manual]9.2 CmdLine
Optimization method of "great mathematics for use" -- optimal design of Cascade Reservoir Irrigation
CCF 1-2 question answering record (1)
Import the data in MariaDB into columnstore
JMeter if controller
【反序列化漏洞-01】序列化与反序列化简介
Leetcode 257. all paths of binary tree
08.01 adjacency matrix
字符串——344.反转字符串
Fiddler packet capture tool summary
How to use a third party without obtaining root permission topic: MIUI chapter
Stream stream