当前位置：网站首页>Convolutional neural network CNN

Convolutional neural network CNN

2022-07-24 19:16:00 【Coding~Man】

https://cs231n.github.io/assets/conv-demo/index.html

Network model training three musketeers ： normalization ,dropout, And activation functions .

1: Common activation functions of Neural Networks
Please add a picture description

2: Convolutional neural networks
Convolution kernel ,padding, and stride step .

Padding The role of , In order to keep the shape of the input-output matrix unchanged .

Stride It's the step length .

Multichannel convolution ：

3: Methods to prevent over fitting
1:DropOut Randomly change the output of the node into 0.
2:DropConnect Random will WX Medium W become 0.
3:DisOut Randomly assign different weights to nodes （0～1 Between ）.
Please add a picture description

4: The activation function is large PK
sigmoid The value range of the function is 0～1 Between .

tanh The value range of the function is -1～1 Between .

relu The value range of the function is 0～X Between .

5: Pooling Pooling

6: Full link layer
The difference between full connection and neural network , Full connection has no activation function .
Please add a picture description
7: normalization
Normalized benefits ： Speed up training , Improve accuracy

Batch Normalization: stay BatchSize Layer normalization , for example BatchSize = 258, Then normalize the data in the dimension of the input quantity of the data .
Only batchsize Big enough to , Because it is too small , Not enough to represent the whole sample .
Please add a picture description

Please add a picture description

C： The channel number
N： Sample size

Please add a picture description
8: Parameter initialization

Xavier initialization , Initialize according to the standard distribution ,node_in = 9, node_out = 8, initialization 9*8 dimension /（9 Square root ）. The best and Tanh Function with .

He initialization Initialization and ReLU Function with .
Please add a picture description
9: Learning rate

Exponential decay , The learning rate decreases with the number of steps .

The above learning rate is the most common way , Periodically change the learning rate to prevent local optimization , and consion Combination of cosine learning rate .
Please add a picture description
10: Optimizer
effect ： After the model is built, the parameters are solved .
Optimizer and @ The difference between ： Because you can't put all the data in the depth model , So use the optimizer . and @ All the data is put in .
The benefits of the optimizer ： The learning rate is variable , You can refer to the historical gradient , You can use each bitchsize Gradient of . Simply put, the benefit of the optimizer is to put the data in once to get the value of the update item and each bitchsize The value obtained from the data is closer .
Please add a picture description

Optimizer ： There are two main ideas , One is Momentum faction , That is, the impulse on the falling belt , Representative for ：SGD. The other is AdaGrad faction , Thought is too volatile , Give him resistance , Prevent you from fluctuating .
Representative for ：Adam A combination of two ideas .
Please add a picture description

Prevent falling into local optimum , We can add the gradient of historical update , And the current gradient weighting
Please add a picture description

Anti noise treatment ：

AdaGrad

RMSprop