当前位置:网站首页>Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud

Go crawler framework -colly actual combat (4) -- Zhihu answer crawl (2) -- visual word cloud

2022-06-25 00:16:00 You're like an ironclad treasure

Original link :Hzy Blog

Try some simple processing of the data today , Then visualize , So I thought of making some rough statistics on the cartoons that have appeared , And then according to Word frequency To output word cloud !

Let's take a look at the renderings first

 Know the words of the good-looking pan Opera

The code is in my GitHub On , There are some for study go Some small projects in the process .

Follow yesterday , Yesterday I grabbed zhihushan's answer , Put it in a file .

  • The first page should be read line by line from the file ( Each line is an answer ).
  • Read out the sentences , We have to do some simple segmentation , For example, only the animation in the book title is extracted .(ps: Of course, libraries that can be analyzed in other languages , Want to python Medium jieba, But I was go There seems to be no similar library found in ), Then just write a simple one by yourself .
  • Extract the animation and count it , We are going to visualize it , I am here github We found go-echarts

2.go-charts Brief introduction

install

go get -u github.com/go-echarts/go-echarts

file :https://go-echarts.github.io/go-echarts/

go-ehcharts Baidu open source is used echarts Chart Library , And provides a concise api.


3. Everything is ready , Here is the time to type the code

3.1 First, open the file , Then read each of these lines , Then split to find the animation name , Then count .

/*
 Word count 
*/
// This structure is used to implement sort Interface used , because map If according to value It's not easy to sort .
type Pair struct {
	Key string
	Value int
}
type PairList []Pair
func (p PairList) Swap(i, j int)      { p[i], p[j] = p[j], p[i] }
func (p PairList) Len() int           { return len(p) }
func (p PairList) Less(i, j int) bool { return p[j].Value < p[i].Value } //  The reverse 
type WordCount map[string]interface{}

// The following symbols are encountered , Segmentation of sentences 

func SplitByMoreStr(r rune) bool{
	splitSymbol := []rune("《》<>")
	for _,v:=range(splitSymbol){
		if r == v{
			return true
		}
	}
	return false


}
//  Here the read line is cut , And simple statistics 
func (wc WordCount)SplitAndStatistics(s string){
	dist1 := strings.FieldsFunc(s,SplitByMoreStr)

	for _,v :=range(dist1){
		flag :=0
		v = strings.Replace(v," ","",-1)
		for key :=range wc {
			if strings.Index(v,key)!=-1{ // The new field contains map Fields that once appeared in , directly +1
				wc[key]=wc[key].(int)+1
				flag =1
			}
		}
		if flag==0{
			if wc[v]==nil{
				wc[v] =1
			}else{
				wc[v]=wc[v].(int)+1
			}

		}
		//fmt.Println(v)
	}
}
//  Read each line of the file , And make statistics 
func (wc WordCount)ReadFile(f *os.File){
	rd := bufio.NewReader(f)
	for{
		line, err := rd.ReadString('\n') // With '\n' Read in a line for the Terminator 
		if err != nil || io.EOF == err {
			break
		}
		wc.SplitAndStatistics(line)// Cut and count 
	}
}
// This function is used to sort , Display the results , But it doesn't use .
func(wc WordCount)AnalysisResut(){
	// take map[string][int]  Turn into struct  Realization sort Interface to achieve sorting function 
	pl :=make(PairList,len(wc))
	i:=0
	for k,v :=range(wc){
		pl[i] = Pair{k,v.(int)}
		i++
	}
	sort.Sort(pl)
	for _,pair :=range(pl){
		fmt.Println(pair.Value,pair.Key)
	}

}

3.42 After cutting , We have to output the word cloud to finish it .

The above libraries are installed , That's all right. .

// route , Output word cloud 
func handler(w http.ResponseWriter, _ *http.Request) {
	nwc := charts.NewWordCloud()

	nwc.SetGlobalOptions(charts.TitleOpts{Title: " Zhihu problem :"})
	wc :=make(wordCount.WordCount)
	f, err := os.Open(wordCount.Path+"answer.txt")
	if err!=nil{
		panic(err)
	}
	defer f.Close()
	wc.ReadFile(f)
	nwc.Add("wordcloud", wc, charts.WordCloudOpts{SizeRange: []float32{14, 250}})
	nwc.Render(w)
}
//  Judge whether the file exists 
func Exists(path string) bool {
	_, err := os.Stat(path)    //os.Stat Get file information 
	if err != nil {
		if os.IsExist(err) {
			return true
		}
		return false
	}
	return true
}

func main(){

	if !Exists(wordCount.Path+"answer.txt"){
		wordCount.QuestionAnswer()
	}
	http.HandleFunc("/", handler)
	http.ListenAndServe(":8081", nil)
}

summary , It's still interesting , Try some better next time , More accurate statistical methods , This should be the problem of naturallanguageprocessing , Ha ha ha , Yes, I have , But I haven't played …

原网站

版权声明
本文为[You're like an ironclad treasure]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202210551199554.html

随机推荐