当前位置:网站首页>Textcnn paper Interpretation -- revolutionary neural networks for sense classification

Textcnn paper Interpretation -- revolutionary neural networks for sense classification

2022-06-26 01:37:00 Green Lantern swordsman

One 、 Abstract

CNN+static vector It performs well in the task of sentence classification , And based on concrete task Fine tuned task-specific vectors Better performance

Two 、 Model structure

 Insert picture description here
It is worth noting that : Our subjects are 2 individual channels. In the first one , The word vector is maintained during training static; In the second , Word vectors are trained according to backPropagation fine-tuning .
2.1 Regularization
(1) The penultimate layer is added dropout
(2) The penultimate layer is added L2 Regular weight constraints .

3、 ... and 、 Data and experiments

3.1 Participate in training
(1) Hyperparameters , From the grid search
(2) In the verification set early stopping
When there is no validation set , Randomly select from the training set 10% As a validation set . The optimizer is SGD.
3.2 Pre trained word vectors
When there is no large amount of training data , Use publicly available word2vec vector Is a popular way to improve performance . Not in word2vec The words in , Its vector is initialized randomly .
3.3 Model variants
CNN-rand: Word vectors of all words are initialized randomly , Fine tune your workout .
CNN-static: Word vectors come from word2vec , Keep... In training static.
CNN-non-static: Word vectors come from word2vec , Fine tune your workout
CNN-multichannel: Two sets are from word2vec The word of the vector . A set of static, A set of fine-tuning in training .

Four 、 Results and Analysis

CNN-rand The result is not good ;CNN-static Very good , but CNN-non-static Perform better .
4.1 many channel Or just channel
We thought there were more channel Can prevent Over fitting , But the result is that mixup, More research is needed . for example , Instead of using more channel, It is Increase the dimension of the vector , These added dimensions can be modified during training .
4.2 Static and non static semantic representations
Vectors that use non static semantic representations , It is more professional for specific tasks specific.
4.3 Further observation

  • Another friend also uses CNN Experimentalize , The result is much worse . We found that :(1) His structure is similar to ours channel Model is similar to .(2) The difference lies in , Our model has a larger capacity, That is, a variety of nuclear widths and characteristics map
  • dropout+ Than necessary The bigger network A great contribution .
  • From distribution U[-a,a] Medium is not in word2ec The word sampling The number , Also got a little Promotion .
  • Adadelta、Adadelta The effect is similar to , But what is needed epoch Less .

5、 ... and 、 Conclusion

Unsupervised training word2vec It's really good .

原网站

版权声明
本文为[Green Lantern swordsman]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180556137252.html