当前位置:网站首页>Go crawler framework -colly actual combat (IV) -- Zhihu answer crawl (I)
Go crawler framework -colly actual combat (IV) -- Zhihu answer crawl (I)
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Preface
I haven't written for several days , I found out these two days , Every time I write a crawler, I have to paste and copy it myself cookie, I feel so troublesome ,colly There is one setCookies, I didn't know how to use it before , Now I see .
siteCokkie :=c.Cookies(URL string)
c.SetCookies(URL string,siteCokkie)
It looks like , You can set a url At the time of the visit cookie La ,cookies It is usually the last request cookies, Then we choose whether to modify according to the situation cookies.
2. I know the above topic , What are some good-looking fan dramas recommended , I thought of climbing down with a reptile , And then count out what good-looking dramas there are .( I have seen almost all the good-looking dramas . A bit of a famine …)
problem : Is there any good play ( Japanese TV animation 、 Network animation 、OVA/OAD Serial works ) Do you ?
Today, let's climb down all the questions below , Tomorrow, I will clean the data , Make statistics !!!
Because it's already twelve o'clock … I don't want to be bald .
3. alike colly frame , Just a simple request , Write to the file and you're done ! Go straight to the code .
Some considerations and processes :
- It seems like every request limt It seems to be limited to 20.
- So pinch , My thoughts , Request to find... At one time totals, You can know how many answers there are .
- And then every time 20 individual ,20 One catch , Just put it in the file .
package main
import (
"encoding/json"
"fmt"
"github.com/PuerkitoBio/goquery"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
"os"
"strings"
)
func main(){
file, error := os.OpenFile("./answer.txt", os.O_RDWR|os.O_CREATE, 0766) // create a file
if error != nil {
fmt.Println(error)
}
defer file.Close()
total := 20 // Know that every time you limit the return 20 answer
i:=0 // The record is the number of answers
c:=colly.NewCollector(func(collector *colly.Collector) {
extensions.RandomUserAgent(collector)
})
c.OnRequest(func(request *colly.Request) {
fmt.Printf("fetch --->%s\n",request.URL.String())
})
c.OnResponse(func(response *colly.Response) {
var f interface{}
json.Unmarshal(response.Body,&f) // Deserialization
// Find out the total number of answers under the question
paging :=f.(map[string]interface{})["paging"]
total = int(paging.(map[string]interface{})["totals"].(float64))
// Find the current url Return all the answers in the data .
data :=f.(map[string]interface{})["data"]
for k,v :=range data.([]interface{}){
content :=v.(map[string]interface{})["content"]
reader :=strings.NewReader(content.(string))
doc,_:=goquery.NewDocumentFromReader(reader)
file.Write([]byte(fmt.Sprintf("%d:%s\n",i+k,doc.Find("p").Text())))
}
})
questionID := "319017029"
for ;i<=total;i+=20{
//c.Visit()
url :=fmt.Sprintf("https://www.zhihu.com/api/v4/questions/%s/answers?include=data[*].is_normal,admin_closed_comment,reward_info,is_collapsed,annotation_action,annotation_detail,collapse_reason,is_sticky,collapsed_by,suggest_edit,comment_count,can_comment,content,editable_content,voteup_count,reshipment_settings,comment_permission,created_time,updated_time,review_info,relevant_info,question,excerpt,relationship.is_authorized,is_author,voting,is_thanked,is_nothelp,is_labeled,is_recognized,paid_info,paid_info_content;data[*].mark_infos[*].url;data[*].author.follower_count,badge[*].topics&offset=%d&limit=%d&sort_by=updated",questionID,i,20)
c.Visit(url)
}
}
4. Tomorrow, I will capture the data , Do some visual analysis , Or statistics ,go There should also be a library for this , Look around tomorrow !!
边栏推荐
- [leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching
- Human body transformation vs digital Avatar
- ∞符号线条动画canvasjs特效
- C program design topic 15-16 final exam exercise solutions (Part 1)
- WordPress add photo album function [advanced custom fields Pro custom fields plug-in series tutorial]
- Related operations of ansible and Playbook
- The new employee of the Department after 00 is really a champion. He has worked for less than two years. The starting salary of 18K is close to me when he changes to our company
- 5-minute NLP: summary of 3 pre training libraries for rapid realization of NER
- 创意SVG环形时钟js特效
- 信号完整性(SI)电源完整性(PI)学习笔记(一)信号完整性分析概论
猜你喜欢

What are the advantages of VR panoramic production? Why is it favored?

使用网络摄像头进行眼睛注视估计

C# Winform 最大化遮挡任务栏和全屏显示问题

wx小程序跳转页面

ArcGIS加载免费在线历史影像作为底图(不需要插件)

Why are life science enterprises on the cloud in succession?

Signal integrity (SI) power integrity (PI) learning notes (XXV) differential pair and differential impedance (V)

融合模型权限管理设计方案

MySQL log management

教程详解|在酷雷曼系统中如何编辑设置导览功能?
随机推荐
Interesting checkbox counters
Intensive reading of thinking about markdown
离散数学及其应用 2018-2019学年春夏学期期末考试 习题详解
[interview question] what is a transaction? What are dirty reads, unrepeatable reads, phantom reads, and how to deal with several transaction isolation levels of MySQL
Do280openshift access control -- encryption and configmap
Ultra vires vulnerability & Logic vulnerability (hot) (VIII)
Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer,在ImageNet分类准确率达88.8%!开源...
∞符号线条动画canvasjs特效
The third generation of power electronics semiconductors: SiC MOSFET learning notes (V) research on driving power supply
Simple collation of Web cache
One way 和two way ANOVA分析的区别是啥,以及如何使用SPSS或者prism进行统计分析
vim使用命令
Report on operation mode and future development trend of global and Chinese propenyl isovalerate industry from 2022 to 2028
创意SVG环形时钟js特效
Eye gaze estimation using webcam
部门新来的00后真是卷王,工作没两年,跳槽到我们公司起薪18K都快接近我了
Signal integrity (SI) power integrity (PI) learning notes (XXV) differential pair and differential impedance (V)
Outer screen and widescreen wasted? Harmonyos folding screen design specification teaches you to use it
Phprunner 10.7.0 PHP code generator
[leaderboard] Carla leaderboard leaderboard leaderboard operation and participation in hands-on teaching