当前位置:网站首页>Sogou news-数据集
Sogou news-数据集
2022-08-03 12:28:00 【51CTO】
2,909,551 篇来自 SogouCA 和 SogouCS 新闻语料库 5 个类别的新闻文章。每个类别分别包含 90,000 个训练样 本和 12,000 个测试样本。这些汉字都已经转换成拼音。
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
译:
本文对字符级卷积网络(ConvNets)在文本分类中的应用进行了实证研究。我们构建了几个大规模的数据集,以证明字符级卷积网络可以达到最先进或最具竞争力的结果。比较了传统模型,如单词包、n-grams及其TFIDF变体,以及基于单词的ConvNets和递归神经网络等深度学习模型。
大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020082502”获取下载链接。
只要自己有时间,都尽量写写文章,与大家交流分享。
本人公众号:

边栏推荐
猜你喜欢

After completing the interview and clearance collection of Alibaba, I successfully won the 15th Offer this year

4500 words sum up, a software test engineer need to master the skill books

无监督学习KMeans学习笔记和实例

nacos应用

图像融合SDDGAN文章学习

Explain the virtual machine in detail!JD.com produced HotSpot VM source code analysis notes (with complete source code)

从零开始C语言精讲篇5:指针

基于php网上零食商店管理系统获取(php毕业设计)

ROS中编译通过但是遇到可执行文件找不到的问题

子结点的数量
随机推荐
基于英雄联盟的知识图谱问答系统
flink流批一体有啥条件,数据源是从mysql批量分片读取,为啥设置成批量模式就不行
The common problems in the futures account summary
[Verilog] HDLBits Problem Solution - Circuits/Sequential Logic/Latches and Flip-Flops
漫谈缺陷管理的自动化实践方案
通过点击CheckBox实现背景变换小案例
self-discipline
基于php志愿者服务平台管理系统获取(php毕业设计)
字节最爱问的智力题,你会几道?
基于php校园医院门诊管理系统获取(php毕业设计)
《数字经济全景白皮书》金融数字用户篇 重磅发布!
mysql advanced (twenty-four) method summary of defense against SQL injection
YOLOv5 training data prompts No labels found, with_suffix is used, WARNING: Ignoring corrupted image and/or label appears during yolov5 training
4500 words sum up, a software test engineer need to master the skill books
shell编程条件语句
Take you understand the principle of CDN technology
第4章 搭建网络库&Room缓存框架
R language ggplot2 visualization: use the patchwork bag plot_layout function will be more visual image together, ncol parameter specifies the number of rows, specify byrow parameters configuration dia
setTimeout 、setInterval、requestAnimationFrame
Filebeat 如何保持文件状态?