当前位置:网站首页>2021 software university ranking crawler program
2021 software university ranking crawler program
2022-06-26 08:56:00 【ML_ python_ get√】
# -*- Coding: UTF-8 -*-
# data.py
# @ author ML_get
# @ Date of creation 2021-04-26T16:00:54.397Z+08:00
# @ Last modified date 2021-04-26T22:12:42.172Z+08:00
# Soft science ranking
import requests
from bs4 import BeautifulSoup
import json
import csv
class FindRank:
def __init__(self, num):
self.num = num
self.headers = {
'User-Agent':
'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1 Edg/90.0.4430.85'
}
pass
def parse(self, url):
# Pass in url Return a dictionary
try:
response = requests.get(url, headers=self.headers, timeout=20)
response.raise_for_status()
dict_text = json.loads(response.content.decode())
return dict_text
except:
return ''
def get_data(self, ulist):
# Extract ranking information
# Print header
print("{:^10}\t{:^20}\t{:^10}".format(' ranking ', ' School name ', ' score '))
for i in range(self.num):
u = ulist[i]
print("{
:^10}\t{
:^20}\t{
:^10}\
".format(u['rankOverall'], u['univNameCn'], u['score']))
def store_data(self, ulist):
with open('rank.csv', 'w', newline='') as f:
w = csv.DictWriter(f, ulist[0].keys())
w.writeheader()
w.writerows(ulist)
print(' Write successfully ')
def run(self):
# Implement the main logic
# 1、 Get web information
url = 'https://www.shanghairanking.cn/api/pub/v1/bcur?bcur_type=11&year=2021'
dict1 = self.parse(url)
# 2、 Extract web page information, store it in data structure and display it
ulist = dict1['data']['rankings']
self.get_data(ulist)
# 3. Or deposited locally
self.store_data(ulist)
if __name__ == '__main__':
rank = FindRank(300)
rank.run()
边栏推荐
- Exploration of webots and ROS joint simulation (I): software installation
- 三菱PLC若想实现以太网无线通讯,需要具备哪些条件?
- 在哪个软件上开户比较安全
- Yolov5进阶之二安装labelImg
- Digital image processing learning (II): Gaussian low pass filter
- FFmpeg音视频播放器实现
- Selenium 搭建 Cookies池 绕过验证反爬登录
- Bezier curve learning
- Object extraction_ nanyangjx
- 1.20 study univariate linear regression
猜你喜欢

Reverse crawling verification code identification login (OCR character recognition)

opencv學習筆記三

力扣399【除法求值】【并查集】

Intra class data member initialization of static const and static constexpr

三菱PLC若想实现以太网无线通讯,需要具备哪些条件?

基于SSM的毕业论文管理系统

yolov5进阶之零环境快速创建及测试

Selenium 搭建 Cookies池 绕过验证反爬登录

The solution of positioning failure caused by framework jump

Text to SQL model ----irnet
随机推荐
1.26 pytorch learning
Deploy wiki system Wiki in kubesphere JS and enable Chinese full-text retrieval
Koa_mySQL_Ts 的整合
1.23 neural network
Install Anaconda + NVIDIA graphics card driver + pytorch under win10_ gpu
Stream analysis of hevc learning
Bezier curve learning
Analysis of Yolo series principle
Leetcode: array fast and slow pointer method
鲸会务为活动现场提供数字化升级方案
【300+精选大厂面试题持续分享】大数据运维尖刀面试题专栏(一)
Addition of attention function in yolov5
Principle of playing card image segmentation
The solution of positioning failure caused by framework jump
在 KubeSphere 部署 Wiki 系统 wiki.js 并启用中文全文检索
Backward usage
SRv6----IS-IS扩展
Euler function: find the number of numbers less than or equal to N and coprime with n
1.21 study logistic regression and regularization
Machine learning (Part 2)