当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
猜你喜欢

A detailed explanation of what is software deployment

24、shell编程-流程控制

GPS satellite synchronization clock, NTP network synchronization clock, Beidou clock server (Jingzhun)

DevOps平台中的制品库是什么?有什么用处?

remote: Check Access Error, please check your access right or username and password!fatal: Authenti

【已解决】allure无法生成json文件和AttributeError: module ‘allure‘ has no attribute ‘severity_level‘

For循环控制

ITSM软件与工单系统的区别是什么?

ICDE‘22推荐系统论文之Research篇

Jupyter常用操作总结(强烈建议收藏,持续更新实用操作)
随机推荐
What is the difference between ITSM software and a work order system?
Codeforces Round #811 A~F
numpy入门详细代码
IP第十八天笔记
【北亚数据恢复】IBM System Storage存储lvm信息丢失,卷访问不了的数据恢复方案
What is the difference between member variable and local variable
阿尔萨斯监控平台&普罗米修斯监控平台对服务器资源的监控
基于 Next.js实现在线Excel
附加:自定义注解(参数校验注解);(写的不好,别看…)
Tinymce plugins [Tinymce 扩展插件集合]
明明加了唯一索引,为什么还是产生重复数据?
How to monitor code cyclomatic complexity by refactoring indicators
如何防止重复下单?
C# 判断文件编码
【Es6中的promise】
苏秋贵:揭秘绿联科技用5年时间从0做到6亿,如何一枝独秀?
全球电子产品需求放缓,三星手机越南工厂每周只需要干 3~4 天
使用百度EasyDL实现森林火灾预警识别
C端折戟,转战B端,联想的元宇宙梦能成吗?
Why, when you added a unique index or create duplicate data?