当前位置:网站首页>Go crawler framework -colly actual combat (I)
Go crawler framework -colly actual combat (I)
2022-06-25 00:17:00 【You're like an ironclad treasure】
Original link :Hzy Blog
1. Make complaints
I'm going to use... These days go Write about reptiles , It used to be python,python Write the schedule , My chicken also has egg pain , Just learned again go, Just want to experience go Write about the pleasure of reptiles .
Before github According to other people's ideas , Write a simple concurrent crawler framework , Yes go Concurrent , I learned a little , Stumble across colly, Compare with others , Reading what I wrote , alas …
2.colly A brief introduction to the use of
github: https://github.com/gocolly/colly
Official website : http://go-colly.org/
2.1 colly Introduce
colly It's a reptile frame , Through him , We can quickly implement a concurrent crawler , Same as easy to understand , Easy to expand .
colly The main thing is Collector, adopt Collector To collect the accessed data , And store it .( Process oriented )
2.1 colly Callback in the process of fetching a page
- Before collector request : onRequest()
- Collector fetch failed :onError()
- After the collector responds :onResponse()
- Collector received HTML:onHTML()
- Collector received XML: onXML()
- The last callback executed after the collector finishes fetching :onScraped()
Through these callbacks , We can quickly write a reptile , There are also many examples on the official website , For our reference , Not really. Look at the source code .
2.2 colly in Collector Configuration of
- The specific configuration information can be viewed on the official website , Just a few words here .
- Crawler domain name crawl restrictions , Maximum depth limit , Whether to crawl duplicate websites , Avoid the dead cycle .
- Set asynchronous , Concurrent number , Set random delay time, etc
- http Whether the long connection is maintained in the , Limit the number of connections, etc .
- It also supports distributed .
- By extending the , We can also set random user-agent,reffer.
2.3 colly Storage in
- The default storage is in memory .
- The official website recommends storing in redis in
- It can also be stored in sqlite3,mongo in , There are relevant examples on the official website .
- colly-sqlite3 Storage
- colly-mongo Storage
3. ending
- If you want to know more about , Take a look at this article :go The crawler frame colly Source code and software architecture analysis , have a look colly Design structure of
- Colly The source code parsing —— Combined with examples to analyze the underlying implementation Under the analysis of colly The main functions in the source code .
Tomorrow, I will write about crawling with this framework leetCode Topics on .
边栏推荐
- Signal integrity (SI) power integrity (PI) learning notes (I) introduction to signal integrity analysis
- 【面试题】什么是事务,什么是脏读、不可重复读、幻读,以及MySQL的几种事务隔离级别的应对方法
- svg线条动画背景js特效
- 【排行榜】Carla leaderboard 排行榜 运行与参与手把手教学
- Investment analysis and prospect forecast report of global and Chinese octadecyl cyclopentanoate industry from 2022 to 2028
- Alternative to log4j
- I suddenly find that the request dependent package in NPM has been discarded. What should I do?
- 部门新来的00后真是卷王,工作没两年,跳槽到我们公司起薪18K都快接近我了
- 时间统一系统
- C# Winform 最大化遮挡任务栏和全屏显示问题
猜你喜欢

How does VR panorama make money? Based on the objective analysis of the market from two aspects

Svg+js keyboard control path

磁带svg动画js特效

人体改造 VS 数字化身

im即时通讯开发应用保活之进程防杀

节奏快?压力大?VR全景客栈带你体验安逸生活

UE4 WebBrowser图表不能显示问题

JPA学习1 - 概述、JPA、JPA核心注解、JPA核心对象

Technology sharing | wvp+zlmediakit realizes streaming playback of camera gb28181

Hibernate learning 2 - lazy loading (delayed loading), dynamic SQL parameters, caching
随机推荐
融合模型权限管理设计方案
Dynamic effect of canvas lines
Tongji and Ali won the CVPR best student thesis, lifeifei won the Huang xutao award, and nearly 6000 people attended the offline conference
[issue 25] face to face experience of golang Engineer in the rightmost social recruitment
Interesting checkbox counters
VR全景制作的优势是什么?为什么能得到青睐?
Requests Library
C程序设计专题 18-19年期末考试习题解答(下)
Tutorial details | how to edit and set the navigation function in the coolman system?
节奏快?压力大?VR全景客栈带你体验安逸生活
Analysis report on operation mode and future development of global and Chinese methyl cyclopentanoate industry from 2022 to 2028
Collection of software testing and game testing articles
Current situation analysis and development trend forecast report of global and Chinese acrylonitrile butadiene styrene industry from 2022 to 2028
Eye gaze estimation using webcam
Domain Driven Design and coding
After 5 years of software testing in didi and ByteDance, it's too real
Collective例子
5年,从“点点点”到现在的测试开发,我的成功值得每一个借鉴。
Analysis report on the development trend and Prospect of cetamide industry in the world and China from 2022 to 2028
Design scheme of authority management of fusion model