当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Face to face Manual Chapter 16: explanation and implementation of fair lock of code peasant association lock and reentrantlock
- Vuejs development specification
- The choice of enterprise database is usually decided by the system architect - the newstack
- 100元扫货阿里云是怎样的体验?
- How to encapsulate distributed locks more elegantly
- What problems can clean architecture solve? - jbogard
- Leetcode's ransom letter
- Using consult to realize service discovery: instance ID customization
- Analysis of react high order components
- 中国提出的AI方法影响越来越大,天大等从大量文献中挖掘AI发展规律
猜你喜欢
How to encapsulate distributed locks more elegantly
速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)
Character string and memory operation function in C language
向北京集结!OpenI/O 2020启智开发者大会进入倒计时
JVM memory area and garbage collection
This article will introduce you to jest unit test
快快使用ModelArts,零基础小白也能玩转AI!
小程序入门到精通(二):了解小程序开发4个重要文件
数据产品不就是报表吗?大错特错!这分类里有大学问
助力金融科技创新发展,ATFX走在行业最前列
随机推荐
Analysis of ThreadLocal principle
Summary of common algorithms of binary tree
向北京集结!OpenI/O 2020启智开发者大会进入倒计时
Swagger 3.0 天天刷屏,真的香嗎?
Aprelu: cross border application, adaptive relu | IEEE tie 2020 for machine fault detection
带你学习ES5中新增的方法
Calculation script for time series data
小程序入门到精通(二):了解小程序开发4个重要文件
Elasticsearch database | elasticsearch-7.5.0 application construction
Polkadot series (2) -- detailed explanation of mixed consensus
The practice of the architecture of Internet public opinion system
Save the file directly to Google drive and download it back ten times faster
深度揭祕垃圾回收底層,這次讓你徹底弄懂她
Architecture article collection
After brushing leetcode's linked list topic, I found a secret!
数字城市响应相关国家政策大力发展数字孪生平台的建设
使用 Iceberg on Kubernetes 打造新一代云原生数据湖
DevOps是什么
Face to face Manual Chapter 16: explanation and implementation of fair lock of code peasant association lock and reentrantlock
采购供应商系统是什么?采购供应商管理平台解决方案