当前位置:网站首页>Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
2020-11-06 01:21:00 【Elementary school students in IT field】
Reprint please indicate the source :https://blog.csdn.net/HHTNAN
n Metamorphemes. See Synonyms at :https://blog.csdn.net/HHTNAN/article/details/62046652
About kenlm Statistical language model :https://blog.csdn.net/HHTNAN/article/details/84231733
Chinese text error correction Division
Chinese text error correction task , Common error types include :
- Homophonic words , Such as With a pair of eyes - With a pair of glasses
- Confusing words and phrases , Such as Wandering Weaver - The Cowherd and the Weaving Maid lovers separated by the Milky Way -- husband and wife living apart
- The word order is reversed , Such as Woody Allen - Alan woody
- Word completion , If love has Providence - If love has Providence
- The shape is wrong , Such as Sorghum - sorghum
- Chinese pinyin spelling , Such as xingfu- Happiness
- Chinese Pinyin abbreviation , Such as sz- Shenzhen
- Grammar mistakes , It's hard to imagine - unimaginable
Of course , For different business scenarios , Not all of these problems exist , For example, input methods need to deal with the first four , Search engines need to deal with all types of , After speech recognition, text error correction only needs to deal with the first two , among ’ The shape is wrong ’ Mainly for five strokes or strokes, handwriting input and so on .
This paper briefly summarizes the types of typographical errors in Chinese :
-
Variant character : Feel the hat , Whatever , It is said that , Disgusting
-
The person's name , Wrong place name : Hami ( just : hami )
-
Pinyin error : Cough number (ke shu)—> ke sou,
-
Intellectual error : Huangpu, Guangzhou ( Pu )
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Real time data synchronization scheme based on Flink SQL CDC
- 至联云分享:IPFS/Filecoin值不值得投资?
- Arrangement of basic knowledge points
- What is the side effect free method? How to name it? - Mario
- 6.5 request to view name translator (in-depth analysis of SSM and project practice)
- 在大规模 Kubernetes 集群上实现高 SLO 的方法
- Not long after graduation, he earned 20000 yuan from private work!
- Examples of unconventional aggregation
- It's so embarrassing, fans broke ten thousand, used for a year!
- WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
猜你喜欢

使用 Iceberg on Kubernetes 打造新一代云原生数据湖
![[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor](/img/cc/17b647d403c7a1c8deb581dcbbfc2f.jpg)
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor

100元扫货阿里云是怎样的体验?

Architecture article collection

数字城市响应相关国家政策大力发展数字孪生平台的建设

如何将数据变成资产?吸引数据科学家

Did you blog today?

Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】

教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化

小程序入门到精通(二):了解小程序开发4个重要文件
随机推荐
How do the general bottom buried points do?
This article will introduce you to jest unit test
(2)ASP.NET Core3.1 Ocelot路由
A debate on whether flv should support hevc
High availability cluster deployment of jumpserver: (6) deployment of SSH agent module Koko and implementation of system service management
TRON智能钱包PHP开发包【零TRX归集】
Installing the consult cluster
教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化
Summary of common algorithms of linked list
加速「全民直播」洪流,如何攻克延时、卡顿、高并发难题?
PN8162 20W PD快充芯片,PD快充充电器方案
Filecoin最新动态 完成重大升级 已实现四大项目进展!
容联完成1.25亿美元F轮融资
多机器人行情共享解决方案
Swagger 3.0 天天刷屏,真的香嗎?
WeihanLi.Npoi 1.11.0/1.12.0 Release Notes
中小微企业选择共享办公室怎么样?
hadoop 命令总结
In depth understanding of the construction of Intelligent Recommendation System
Analysis of react high order components