当前位置:网站首页>Getting started with scrapy

Getting started with scrapy

2022-06-24 12:25:00 HLee

brief introduction

Scrapy It's a way of crawling website data , Application framework for extracting structural data . Can be applied to include data mining , In a series of programs that process or store historical data . It was originally designed to Page grabbing ( More specifically , Network capture ) Designed by , It can also be used to get API Data returned ( for example Amazon Associates Web Services ) Or general purpose web crawlers .

  • Scrapy yes Python A fast development 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing .
  • Scrapy The attraction is that it's a framework , Anyone can modify it conveniently according to their needs . It also provides a base class for many types of reptiles , Such as BaseSpider、sitemap Reptiles, etc , The latest version offers web2.0 Reptile support .

Scrapy

Scrapy install

pip install Scrapy

Scrapy The new project

scrapy startproject scrapyspider

 remarks : New project name -scrapyspider

Will generate Scrapy project , The name of the project is scrapyspider , The structure is as follows : Major rewriting 2 File :“items、settings”, newly added 2 File :“ Crawler main program ”、itemcsvexporter.

scrapyspider
 scrapy.cfg                    # Automatically generate when creating a project , The configuration file for the project 
 scrapyspider/
    __init__.py                # Automatically generate when creating a project , No changes required 
    items.py                   # Automatically generate when creating a project , Define the crawled fields     
    pipelines.py               # Automatically generate when creating a project , If stored in documents , No changes required     
    settings.py                # Automatically generate when creating a project , Output the crawling fields in order     
    middlewares.py             # Automatically generate when creating a project , No changes required     
    spiders/   
        __init__.py            # Automatically generate when creating a project , No changes required 	
	itemcsvexporter.py         # You need to write it yourself , The code is fixed 	
         Crawler main program .py           # You need to write it yourself , The main program of the crawler 
原网站

版权声明
本文为[HLee]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/06/20210602165102681x.html