当前位置:网站首页>Crawler crawls Sina Weibo data
Crawler crawls Sina Weibo data
2022-06-25 03:54:00 【Blockchain research】
Tools : Cloud gathering reptile
The goal is : Capture all microblogs of a blogger
Analyze the structure of the web page :
The idea of our crawling is to simulate the browser to automatically access the page crawling .
Let's take a look at the page structure , First, each Weibo list , Three or four pull-down loads are required , If there is a page turning button at the bottom , Then judge that this page is loaded .

Login problem
To crawl, you need to log in , How to login ?
No verification code is required for login , If you make a mistake , Will ask you to enter the verification code , So there is no technical difficulty in logging in .
We can create one 【 Login module 】, First log in with a browser , In the future, all pages will be shared based on this browser cookie Go grab it .

Flow chart design :

We don't need the details page of Weibo . So there is no detail page for the whole crawler process , The data is extracted from the list .
Crawling results :
Total cost 5 Minutes of time , Grab it 10 A page , 400 microblogs in total . Because my microblog is not posted very often .
The data are as follows :

Make a simple word cloud :

边栏推荐
- 2022-06-21-Flink-49(一. SQL手册)
- ICML 2022 | ByteDance AI Lab proposes a multimodal model: x-vlm, learning multi granularity alignment of vision and language
- 9 necessary soft skills for program ape career development
- js工具函数,自己封装一个节流函数
- Li Kou daily question - day 26 -506 Relative rank
- Skywalking implements cross thread trace delivery
- 威马招股书拆解:电动竞争已结束,智能排位赛刚开始
- 存算一体芯片离普及还有多远?听听从业者怎么说 | 对撞派 x 后摩智能
- Musk: Twitter should learn from wechat and make 1billion people "live on it" into a super app
- 教你如何在winpe里安装win11系统
猜你喜欢

Apple's legendary design team disbanded after jobs refused to obey cook

【Harmony OS】【ArkUI】ets开发 图形与动画绘制
![[rust submission] review impl trail and dyn trail in rust](/img/bc/05b3e031659ce19d6f6e3887d70512.jpg)
[rust submission] review impl trail and dyn trail in rust

The release function completed 02 "IVX low code sign in system production"

The sign in function completes 03 "IVX low code sign in system production"

Sleep more, you can lose weight. According to the latest research from the University of Chicago, sleeping more than 1 hour a day is equivalent to eating less than one fried chicken leg

AI自己写代码让智能体进化!OpenAI的大模型有“人类思想”那味了

zabbix的安装避坑指南

Redis related-02

Lao Ye's blessing
随机推荐
Jilin University 22 spring March "official document writing" assignment assessment-00084
Skywalking implements cross thread trace delivery
Russian Airi Research Institute, etc. | SEMA: prediction of antigen B cell conformation characterization using deep transfer learning
用CPU方案打破内存墙?学PayPal堆傲腾扩容量,漏查欺诈交易量可降至1/30
【Harmony OS】【ARK UI】ETS 上下文基本操作
腾讯开源项目「应龙」成Apache顶级项目:前身长期服务微信支付,能hold住百万亿级数据流处理...
协作+安全+存储,云盒子助力深圳爱德泰重构数据中心
Is it safe to open an online stock account?
Sleep more, you can lose weight. According to the latest research from the University of Chicago, sleeping more than 1 hour a day is equivalent to eating less than one fried chicken leg
Tensorflow, danger! Google itself is the one who abandoned it
程序猿职业发展9项必备软技能
西电AI专业排名超清北,南大蝉联全国第一 | 2022软科中国大学专业排名
ASP. Net conference room booking applet source code booking applet source code
马斯克:推特要学习微信,让10亿人「活在上面」成为超级APP
Winxp kernel driver debugging
Is it safe to open a stock account with the customer's haircut account link? Tell me what you know
Is it safe to open an account online? How to open an account?
Demonstration of combination of dream CAD cloud map and GIS
Internet Explorer died, and netizens started to build a true tombstone
后台页制作01《ivx低代码签到系统制作》