当前位置：网站首页>Classic model - Nin & googlenet

Classic model - Nin & googlenet

2022-06-26 03:28:00 【On the right is my goddess】

List of articles

NiN
GoogleNet

NiN

The problem of full connection layer ： Contains a large number of parameters . It's easy to over fit .

Usually $\times Image size \times Output scale$

NiN The idea is ： Don't connect all layers at all ;

One NiN block ：
Please add a picture description

The convolution layer is followed by two 1x1 Convolution of , The stride is 1, No filling , The output shape is the same as the convolution output . It acts as a full connection layer （ Per pixel ）.

NiN The architecture of ：

No full connection layer ;
Use alternately NiN Blocks and strides are 2 The largest pool layer of （ Gradually reduce the height and width and increase the number of channels ）;
Finally, the output is obtained by using the global average pooling layer （ The number of input channels is the number of categories ）;

If we want to get 1000 Class words , Finally, there is 1000 Channels , The confidence of the corresponding class of this channel is obtained by global average pooling .

summary ：

NiN Blocks use convolution layers +2 individual 1x1 Convolution layer , The latter adds nonlinearity to each pixel ;
Global average pooling replaces VGG and AlexNet The full connection layer of , Few parameters , It's not easy to over fit .

Please add a picture description
Parameter used Alex That set , But I added some 1x1 Convolution of .

GoogleNet

How to choose the best super parameter ？
Convolution kernel 、 Pooling ways 、 The channel number ？

Inception block ： Every convolution has to , Last concatenation（ Height width unchanged , Channel number connection ）.

Please add a picture description
You can see , The function of white blocks is to reduce the complexity of the model by changing the number of channels （ That is, the parameter quantity ）. The blue block is used to extract information .

The design idea of first decreasing and then increasing is bottleneck The feeling of .

Inception A block is compared to a single 3x3 or 5x5 Compared to convolution , It has fewer parameters and computational complexity .

meanwhile Inception Blocks also increase the diversity of information learned from them .

Please add a picture description
Stage1 and Stage2 and VGG Agreement .GoogleNet Used a lot of NiN Thought , Use... In large quantities 1x1 Convolution reduces the amount of parameters .

Please add a picture description
Compared with AlexNet,GoogleNet The convolution kernel of is relatively small , This allows spatial information not to be compressed very quickly , Support information learning when the number of subsequent channels increases .

meanwhile , Spatial information is compressed , I think it is also a helpless move to increase the number of channels , The purpose is to reduce the number of parameters .

Please add a picture description
The third stage , You can see that the number of channels is still increasing , But every one of them Inception The parameters of the blocks are different . It is worth mentioning that ,3x3 Convolution is always the most allocated , This is because its parameters are not large , The effect of extracting information is also OK .