当前位置:网站首页>Textcnn paper Interpretation -- revolutionary neural networks for sense classification
Textcnn paper Interpretation -- revolutionary neural networks for sense classification
2022-06-26 01:37:00 【Green Lantern swordsman】
One 、 Abstract
CNN+static vector It performs well in the task of sentence classification , And based on concrete task Fine tuned task-specific vectors Better performance
Two 、 Model structure

It is worth noting that : Our subjects are 2 individual channels. In the first one , The word vector is maintained during training static; In the second , Word vectors are trained according to backPropagation fine-tuning .
2.1 Regularization
(1) The penultimate layer is added dropout
(2) The penultimate layer is added L2 Regular weight constraints .
3、 ... and 、 Data and experiments
3.1 Participate in training
(1) Hyperparameters , From the grid search
(2) In the verification set early stopping
When there is no validation set , Randomly select from the training set 10% As a validation set . The optimizer is SGD.
3.2 Pre trained word vectors
When there is no large amount of training data , Use publicly available word2vec vector Is a popular way to improve performance . Not in word2vec The words in , Its vector is initialized randomly .
3.3 Model variants
CNN-rand: Word vectors of all words are initialized randomly , Fine tune your workout .
CNN-static: Word vectors come from word2vec , Keep... In training static.
CNN-non-static: Word vectors come from word2vec , Fine tune your workout
CNN-multichannel: Two sets are from word2vec The word of the vector . A set of static, A set of fine-tuning in training .
Four 、 Results and Analysis
CNN-rand The result is not good ;CNN-static Very good , but CNN-non-static Perform better .
4.1 many channel Or just channel
We thought there were more channel Can prevent Over fitting , But the result is that mixup, More research is needed . for example , Instead of using more channel, It is Increase the dimension of the vector , These added dimensions can be modified during training .
4.2 Static and non static semantic representations
Vectors that use non static semantic representations , It is more professional for specific tasks specific.
4.3 Further observation
- Another friend also uses CNN Experimentalize , The result is much worse . We found that :(1) His structure is similar to ours channel Model is similar to .(2) The difference lies in , Our model has a larger capacity, That is, a variety of nuclear widths and characteristics map
- dropout+ Than necessary The bigger network A great contribution .
- From distribution U[-a,a] Medium is not in word2ec The word sampling The number , Also got a little Promotion .
- Adadelta、Adadelta The effect is similar to , But what is needed epoch Less .
5、 ... and 、 Conclusion
Unsupervised training word2vec It's really good .
边栏推荐
- 2022 documenter general basic (documenter) exam simulation 100 questions and online simulation exam
- 图文大师印章简易制作
- Technical introduction - detailed explanation of chip manufacturing process
- Installing MySQL databases in FreeBSD
- Native DOM vs. virtual DOM
- C disk cleaning strategy of win10 system
- 手机卡开户的流程是什么?网上开户是否安全么?
- 新库上线 | CnOpenData中国新房信息数据
- MySQL example - comprehensive case (multi condition combined query)
- 2022 explosion proof electrical operation certificate examination question bank and simulation examination
猜你喜欢

26. histogram back projection

The overall process of adding, deleting, modifying and querying function items realized by super detailed SSM framework

Reading notes on how to connect the network - hubs, routers and routers (III)

Cross validation -- a story that cannot be explained clearly

Technical introduction - detailed explanation of chip manufacturing process

SPI protocol

Musk vs. jobs, who is the greatest entrepreneur in the 21st century

Zhihuijia - full furniture function

【花雕体验】11 上手ESP32C3

Duck feeding data instant collection solution resources
随机推荐
完整复习(包含语法)--MYSQL正则表达式
毕业季你考虑好去留了吗
RT thread project engineering construction and configuration - (Env kconfig)
《网络是怎么样连接的》读书笔记 - 集线器、路由器和路由器(三)
Development and monitoring of fusion experiment pulse power supply by LabVIEW
2022资料员-通用基础(资料员)考试模拟100题及在线模拟考试
Loss function of depth model
Cross validation -- a story that cannot be explained clearly
Common deep learning optimizers
What is the process of opening a mobile card account? Is it safe to open an account online?
Summary of informer's paper
Embedded C second learning notes
Etcd database source code analysis cluster communication initialization
物联网?快来看 Arduino 上云啦
New library launched | cnopendata wholesale price data of agricultural products
26. histogram back projection
containerd客户端比较
生信周刊第33期
22. pixel remapping
Xinku online | cnopendata text data of IPO declaration and issuance of A-share listed companies