当前位置:网站首页>Stutter participle_ Principle of word breaker
Stutter participle_ Principle of word breaker
2022-06-28 09:24:00 【Java architects must see】
install jieba library :pip3 install jieba
# Stuttering participle
# -*- coding:utf-8 -*-
import sys
import os
import jiebasent = ' Tianshan intelligence is a business intelligence enterprise BI、 Data analysis 、 Technical community in the field of data mining and big data technology www.hellobi.com . Content from the initial business intelligence BI The field has also been extended to data analysis 、 Data mining is related to big data In the field of technology , Include R、Python、SPSS、Hadoop、Spark、Hive、Kylin etc. , Become a vertical community focused on the data field . Tianshan intelligence is committed to building an ecosystem based on the data field , Link everything through the community Data related resources : For example, the data itself 、 people 、 Data solution providers and enterprises , Work together with everyone to promote big data 、 business intelligence BI Popularization and development in China .'
print (sent)Stuttering word segmentation module has three word segmentation modes :
1. All model : Scan all the words that can be made into words in a sentence , Very fast , But it doesn't solve the ambiguity . This full mode , According to the dictionary , Match and divide all the words that appear , So there will be repetition , obviously , This is not what we need .
2. Accurate model : Try to cut the sentence as precisely as possible , Suitable for text analysis ( similar LTP Word segmentation ), And this precise model is closer to what we want .
3. Search engine model : Segmentation of long words based on precise patterns , Increase recall rate , Suitable for search engine segmentation . This search engine model is also good , More detailed .
# All model
wordlist = jieba.cut(sent,cut_all = True)
print('|'.join(wordlist))# Exact segmentation
wordlist = jieba.cut(sent)
print('|'.join(wordlist)) # Search engine model
wordlist = jieba.cut_for_search(sent)
print('|'.join(wordlist))Find new problems -- Add user-defined dictionary : Looking back at the results of the exact model , Find some new words or professional words , for example : Tianshan intelligence 、 big data , These should no longer be cut apart , So based on the default dictionary , We can load custom dictionaries . Enter my jieba Module directory -> See a dict The dictionary of , open -> Found to have 1. word 2. Numbers ( For word frequency , The higher the height, the easier it is to match ) 3. The part of speech . For convenience , We define and add a dictionary named userdict.txt
# Add user-defined dictionary
# Use the user dictionary
jieba.load_userdict('D:\\Anaconda3\\Lib\\site-packages\\jieba\\userdict.txt')
wordlist = jieba.cut(sent)
print('|'.join(wordlist)) Reference material :
https://zhuanlan.zhihu.com/p/29747350?utm_source=qq&utm_medium=social&utm_oi=780081763178258432
That's the end of today's article , Thank you for reading ,Java Architects must see I wish you a promotion and a raise , Good luck every year .
边栏推荐
- P2394 yyy loves Chemistry I
- 数字人行业爆发在即,市场格局几何?
- Fastjason filter field
- 01 distributed system overview
- File operations in QT
- Rich text - Test Case
- Implementation of single sign on
- 自动转换之-面试题
- Check whether the table contains rows SQL Server 2005 - check whether a table contains rows or not SQL Server 2005
- 微信小程序开发日志
猜你喜欢
随机推荐
Fastjason filter field
redis5.0的槽点迁移,随意玩(单机迁移集群)
Which occupational groups are suitable for the examination
RMAN backup message ora-19809 ora-19804
在本类私有属性直接使用?new()在使用!!!
P2394 yyy loves Chemistry I
数字人行业爆发在即,市场格局几何?
详解final、finally和finalize
SQL injection file read / write
What is online account opening? Is it safe to open an account online now?
如何实现基于 RADIUS 协议的双因子认证 MFA?
华泰证券网上开户安全吗 办理流程是什么
怎样在手机上开户?现在网上开户安全么?
PMP考试重点总结九——收尾
Illustration of MySQL binlog, redo log and undo log
Campus honey decoration of APP course design (e-commerce platform)
Divide and rule classic Hanoi
English translation plug-in installation of idea
小米旗下支付公司被罚 12 万,涉违规开立支付账户等:雷军为法定代表人,产品包括 MIUI 钱包 App
Music website design based on harmonyos (portal page)








