当前位置:网站首页>Go crawler framework -colly actual combat (II) -- Douban top250 crawling
Go crawler framework -colly actual combat (II) -- Douban top250 crawling
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Try to use it today colly Come and crawl for the watercress Top 250!( Everyone likes to practice with him …)
Go straight to the code , There are notes on it .
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
"github.com/gocolly/colly"
"github.com/gocolly/colly/extensions"
"regexp"
"strings"
"time"
)
func main() {
t := time.Now()
number := 1
c := colly.NewCollector(func(c *colly.Collector) {
extensions.RandomUserAgent(c) // Set random header
c.Async=true
},
// Filter url, It's not https://movie.douban.com/top250?start=0&filter= Of url
colly.URLFilters(
regexp.MustCompile("^(https://movie\\.douban\\.com/top250)\\?start=[0-9].*&filter="),
),
) // Create collector
// The format of the response is HTML, Extract the links in the page
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
//fmt.Printf("find link: %s\n", e.Request.AbsoluteURL(link))
c.Visit(e.Request.AbsoluteURL(link))
})
// Get movie information
c.OnHTML("div.info", func(e *colly.HTMLElement) {
e.DOM.Each(func(i int, selection *goquery.Selection) {
movies := selection.Find("span.title").First().Text()
director := strings.Join(strings.Fields(selection.Find("div.bd p").First().Text()), " ")
quote := selection.Find("p.quote span.inq").Text()
fmt.Printf("%d --> %s:%s %s\n", number, movies, director, quote)
number += 1
})
})
c.OnError(func(response *colly.Response, err error) {
fmt.Println(err)
})
c.Visit("https://movie.douban.com/top250?start=0&filter=")
c.Wait()
fmt.Printf(" Spend time :%s",time.Since(t))
}

github Address :github Address
I think it is very convenient to use this framework , Tomorrow, I will try to crawl some websites that need to log in !
边栏推荐
- Why do more and more physical stores use VR panorama? What are the advantages?
- C程序设计专题 15-16年期末考试习题解答(上)
- One way 和two way ANOVA分析的区别是啥,以及如何使用SPSS或者prism进行统计分析
- Arbitrary file download of file operation vulnerability (7)
- The third generation of power electronics semiconductors: SiC MOSFET learning notes (V) research on driving power supply
- Wx applet jump page
- Outer screen and widescreen wasted? Harmonyos folding screen design specification teaches you to use it
- Collective例子
- ArcGIS loads free online historical images as the base map (no plug-ins are required)
- 【图数据库性能和场景测试利器LDBC SNB】系列一:数据生成器简介 & 应用于GES服务
猜你喜欢

I suddenly find that the request dependent package in NPM has been discarded. What should I do?

浅析大型IM即时通讯系统开发难度

Power application of 5g DTU wireless communication module

怎么把wps表格里某一列有重复项的整行删掉

同济、阿里获CVPR最佳学生论文,李飞飞获黄煦涛奖,近6000人线下参会

MySQL log management

人体改造 VS 数字化身

微搭低代码中实现增删改查

教程详解|在酷雷曼系统中如何编辑设置导览功能?

svg+js键盘控制路径
随机推荐
水库大坝安全监测
@mysql
Tutorial details | how to edit and set the navigation function in the coolman system?
Analysis report on development trend and investment forecast of global and Chinese D-leucine industry from 2022 to 2028
为什么生命科学企业都在陆续上云?
Ten commandments of self-learning in machine learning
Ultra vires vulnerability & Logic vulnerability (hot) (VIII)
融合模型权限管理设计方案
美国众议院议员:数字美元将支持美元作为全球储备货币
C program design topic 15-16 final exam exercise solutions (Part 1)
Adding, deleting, modifying and checking in low build code
im即时通讯开发应用保活之进程防杀
UE4 WebBrowser图表不能显示问题
微搭低代码中实现增删改查
Report on operation pattern and future prospect of global and Chinese propyl isovalerate industry from 2022 to 2028
C program design topic 18-19 final exam exercise solutions (Part 2)
Eye gaze estimation using webcam
Tape SVG animation JS effect
The file containing the file operation vulnerability (6)
Collective example