当前位置:网站首页>Perform Jieba word segmentation on the required content and output EXCEL documents according to word frequency
Perform Jieba word segmentation on the required content and output EXCEL documents according to word frequency
2022-07-25 22:21:00 【Buddhist monk】
Read in excel data structure :
import pandas as pd
import jieba
df = pd.read_excel('xuqiufenxi.xls')
print(df)
# Create a new column to store word segmentation results
df['fenci'] = ''
# Traverse the text of each line , And save the word segmentation results into the new column
for i in range(len(df)):
print(i)
df['fenci'][i] = ' '.join(jieba.cut(df[' Content of requirements '][i]))
print(df['fenci'][i])
# Count the number of times each word appears
word_count = {
}
for word in df['fenci'][i].split():
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
# take word_count The dictionary is converted into dataframe
word_count_df = pd.DataFrame(word_count.items(), columns=['word', 'count'])
# according to count Value descending sort
word_count_df = word_count_df.sort_values(by='count', ascending=False)
# Output excel
word_count_df.to_excel(f"{
df[' function '][i]}.xlsx", index=False)
Output :
边栏推荐
- C language: random generated number + selective sorting
- 3dslicer introduction and installation tutorial
- JSP novice
- 【Leetcode】502.IPO(困难)
- Redis foundation 2 (notes)
- Can I buy financial products with a revenue of more than 6% after opening an account
- win10搭建flutter环境踩坑日记
- Three ways to allocate disk space
- H5 lucky scratch lottery free official account + direct operation
- 还不懂mock测试?一篇文章带你熟悉mock
猜你喜欢

编译和反编译

Application of breakthrough thinking in testing work

Wechat official account application development (I)

Xiaobai programmer's seventh day

Square root of X

The automation testing post spent 20K recruiting, but in the end, there was no suitable one. Both fresh students are better than them

QML module not found

谷歌分析UA怎么转最新版GA4最方便

Smart S7-200 PLC channel free mapping function block (do_map)

MySQL --- 子查询 - 列子查询(多行子查询)
随机推荐
ThreadLocal 总结(未完待续)
D3.js learning
如何将一个域名解析到多个IP地址?
别投了,软件测试岗位饱和了...
[assembly language 01] basic knowledge
Which is reliable between qiniu business school and WeiMiao business school? Is it safe to open an account recommended by the teacher?
Having met a tester with three years' experience in Tencent, I saw the real test ceiling
ML-Numpy
Can I buy financial products with a revenue of more than 6% after opening an account
Minor GC 和 Full GC 有什么不同呢?
Flex layout
SQL中in的用法 DQL 查询
xxl-job中 关于所有日志系统的源码的解读(一行一行源码解读)
Wkid in ArcGIS
4day
What should I do if I encounter the problem of verification code during automatic testing?
什么是类加载?类加载的过程?
jenkins+SVN配置
Xiaobai programmer's first day
Math programming classification