当前位置:网站首页>Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud
2022-06-25 00:16:00 【You're like an ironclad treasure】
Original link :Hzy Blog
Try some simple processing of the data today , Then visualize , So I thought of making some rough statistics on the cartoons that have appeared , And then according to Word frequency To output word cloud !
Let's take a look at the renderings first

The code is in my GitHub On , There are some for study go Some small projects in the process .
Follow yesterday , Yesterday I grabbed zhihushan's answer , Put it in a file .
- The first page should be read line by line from the file ( Each line is an answer ).
- Read out the sentences , We have to do some simple segmentation , For example, only the animation in the book title is extracted .(ps: Of course, libraries that can be analyzed in other languages , Want to python Medium jieba, But I was go There seems to be no similar library found in ), Then just write a simple one by yourself .
- Extract the animation and count it , We are going to visualize it , I am here github We found
go-echarts
2.go-charts Brief introduction
install
go get -u github.com/go-echarts/go-echarts
file :https://go-echarts.github.io/go-echarts/
go-ehcharts Baidu open source is used echarts Chart Library , And provides a concise api.
3. Everything is ready , Here is the time to type the code
3.1 First, open the file , Then read each of these lines , Then split to find the animation name , Then count .
/*
Word count
*/
// This structure is used to implement sort Interface used , because map If according to value It's not easy to sort .
type Pair struct {
Key string
Value int
}
type PairList []Pair
func (p PairList) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p PairList) Len() int { return len(p) }
func (p PairList) Less(i, j int) bool { return p[j].Value < p[i].Value } // The reverse
type WordCount map[string]interface{}
// The following symbols are encountered , Segmentation of sentences
func SplitByMoreStr(r rune) bool{
splitSymbol := []rune("《》<>")
for _,v:=range(splitSymbol){
if r == v{
return true
}
}
return false
}
// Here the read line is cut , And simple statistics
func (wc WordCount)SplitAndStatistics(s string){
dist1 := strings.FieldsFunc(s,SplitByMoreStr)
for _,v :=range(dist1){
flag :=0
v = strings.Replace(v," ","",-1)
for key :=range wc {
if strings.Index(v,key)!=-1{ // The new field contains map Fields that once appeared in , directly +1
wc[key]=wc[key].(int)+1
flag =1
}
}
if flag==0{
if wc[v]==nil{
wc[v] =1
}else{
wc[v]=wc[v].(int)+1
}
}
//fmt.Println(v)
}
}
// Read each line of the file , And make statistics
func (wc WordCount)ReadFile(f *os.File){
rd := bufio.NewReader(f)
for{
line, err := rd.ReadString('\n') // With '\n' Read in a line for the Terminator
if err != nil || io.EOF == err {
break
}
wc.SplitAndStatistics(line)// Cut and count
}
}
// This function is used to sort , Display the results , But it doesn't use .
func(wc WordCount)AnalysisResut(){
// take map[string][int] Turn into struct Realization sort Interface to achieve sorting function
pl :=make(PairList,len(wc))
i:=0
for k,v :=range(wc){
pl[i] = Pair{k,v.(int)}
i++
}
sort.Sort(pl)
for _,pair :=range(pl){
fmt.Println(pair.Value,pair.Key)
}
}
3.42 After cutting , We have to output the word cloud to finish it .
The above libraries are installed , That's all right. .
// route , Output word cloud
func handler(w http.ResponseWriter, _ *http.Request) {
nwc := charts.NewWordCloud()
nwc.SetGlobalOptions(charts.TitleOpts{Title: " Zhihu problem :"})
wc :=make(wordCount.WordCount)
f, err := os.Open(wordCount.Path+"answer.txt")
if err!=nil{
panic(err)
}
defer f.Close()
wc.ReadFile(f)
nwc.Add("wordcloud", wc, charts.WordCloudOpts{SizeRange: []float32{14, 250}})
nwc.Render(w)
}
// Judge whether the file exists
func Exists(path string) bool {
_, err := os.Stat(path) //os.Stat Get file information
if err != nil {
if os.IsExist(err) {
return true
}
return false
}
return true
}
func main(){
if !Exists(wordCount.Path+"answer.txt"){
wordCount.QuestionAnswer()
}
http.HandleFunc("/", handler)
http.ListenAndServe(":8081", nil)
}
summary , It's still interesting , Try some better next time , More accurate statistical methods , This should be the problem of naturallanguageprocessing , Ha ha ha , Yes, I have , But I haven't played …
边栏推荐
- I suddenly find that the request dependent package in NPM has been discarded. What should I do?
- canvas螺旋样式的动画js特效
- Analysis report on operation mode and future development of global and Chinese methyl cyclopentanoate industry from 2022 to 2028
- Development status and prospect trend forecast report of humic acid sodium industry in the world and China from 2022 to 2028
- Global and Chinese 3-Chlorobenzaldehyde industry operation mode and future development trend report 2022 ~ 2028
- 为什么越来越多的实体商铺用VR全景?优势有哪些?
- Report on operation pattern and future prospect of global and Chinese propyl isovalerate industry from 2022 to 2028
- 无需显示屏的VNC Viewer远程连接树莓派
- Requests Library
- Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform
猜你喜欢

Difficult and miscellaneous problems: A Study on the phenomenon of text fuzziness caused by transform

Tape SVG animation JS effect

Sitelock helps you with the top ten common website security risks

∞ symbol line animation canvasjs special effect

时间统一系统

svg线条动画背景js特效

Why do more and more physical stores use VR panorama? What are the advantages?

离散数学及其应用 2018-2019学年春夏学期期末考试 习题详解

Fast pace? high pressure? VR panoramic Inn brings you a comfortable life
Design and practice of vivo server monitoring architecture
随机推荐
U.S. House of Representatives: digital dollar will support the U.S. dollar as the global reserve currency
The new employee of the Department after 00 is really a champion. He has worked for less than two years. The starting salary of 18K is close to me when he changes to our company
为什么生命科学企业都在陆续上云?
∞符号线条动画canvasjs特效
Analysis report on development trend and investment forecast of global and Chinese D-leucine industry from 2022 to 2028
Hibernate learning 2 - lazy loading (delayed loading), dynamic SQL parameters, caching
VIM use command
D does not require opapply() as a domain
Reservoir dam safety monitoring
VR全景制作的优势是什么?为什么能得到青睐?
China CAE industry investment strategic planning and future development analysis report 2022 ~ 2028
Ten commandments of self-learning in machine learning
In the past 5 years, from "Diandian" to the current test development, my success is worth learning from.
【面试题】instancof和getClass()的区别
UE4 WebBrowser图表不能显示问题
MySQL semi sync replication
Investment analysis and prospect forecast report of global and Chinese octadecyl cyclopentanoate industry from 2022 to 2028
机器学习自学成才的十条戒律
Hibernate学习2 - 懒加载(延迟加载)、动态SQL参数、缓存
Adding, deleting, modifying and checking in low build code