当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- TRON智能钱包PHP开发包【零TRX归集】
- Vue 3 responsive Foundation
- 基於MVC的RESTFul風格API實戰
- Deep understanding of common methods of JS array
- 從小公司進入大廠,我都做對了哪些事?
- What is the difference between data scientists and machine learning engineers? - kdnuggets
- html
- 华为云“四个可靠”的方法论
- How to get started with new HTML5 (2)
- This article will introduce you to jest unit test
猜你喜欢
全球疫情加速互联网企业转型,区块链会是解药吗?
关于Kubernetes 与 OAM 构建统一、标准化的应用管理平台知识!(附网盘链接)
中小微企业选择共享办公室怎么样?
How to encapsulate distributed locks more elegantly
“颜值经济”的野望:华熙生物净利率六连降,收购案遭上交所问询
2019年的一个小目标,成为csdn的博客专家,纪念一下
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
如何将数据变成资产?吸引数据科学家
EOS创始人BM: UE,UBI,URI有什么区别?
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
随机推荐
ES6学习笔记(五):轻松了解ES6的内置扩展对象
The difference between Es5 class and ES6 class
Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的
从海外进军中国,Rancher要执容器云市场牛耳 | 爱分析调研
H5 makes its own video player (JS Part 2)
Python + appium automatic operation wechat is enough
做外包真的很难,身为外包的我也无奈叹息。
Polkadot series (2) -- detailed explanation of mixed consensus
Synchronous configuration from git to consult with git 2consul
快快使用ModelArts,零基础小白也能玩转AI!
xmppmini 專案詳解:一步一步從原理跟我學實用 xmpp 技術開發 4.字串解碼祕笈與訊息包
Introduction to Google software testing
Thoughts on interview of Ali CCO project team
怎么理解Python迭代器与生成器?
快快使用ModelArts,零基礎小白也能玩轉AI!
How to select the evaluation index of classification model
深度揭祕垃圾回收底層,這次讓你徹底弄懂她
Why do private enterprises do party building? ——Special subject study of geek state holding Party branch
Architecture article collection
Using consult to realize service discovery: instance ID customization