当前位置：网站首页>Lihongyi, machine learning 6 Convolutional neural network

Lihongyi, machine learning 6 Convolutional neural network

2022-06-25 02:36:00 【AXYZdong】

Author：AXYZdong
Li Hongyi 《 machine learning 》 series
Refer to the video address ：https://www.bilibili.com/video/BV1Ht411g7Ef
Reference documents ：DataWhale file

List of articles

One 、 Why? CNN For image processing

When we use fully connect feedforward network To do image processing , Often we need too many parameters , for instance , Suppose this is a 100 *100 The color chart ( A very small one imgage), You pull this into a vector,( How many of them are there pixel), It has 100 *100 3 Of pixel.

If it's a color chart , Every pixel Three are needed value To describe it , Namely 30000 dimension (30000 dimension), that input vector If so 30000dimension, So this one hidden layer The assumption is 1000 individual neural, So this hidden layer The parameter of the is 30000 *1000, That would be too much .

that CNN What you do is simplify neural network The architecture of . Based on our understanding of images , some weight It doesn't work , We filtered it out from the beginning . Not with fully connect feedforward network, It uses fewer parameters to do image processing . therefore CNN More than average DNN And simple .

Insert picture description here

▲ Why? CNN For image processing

CNN Just use power-knowledge Go get the original fully connect layer Remove some parameters from the .

Why is it possible for us to remove some parameters ( Why can image processing be performed with fewer parameters )

Some feature images are much smaller than the whole image
The same features will appear in different areas

Insert picture description here

Pixel scaling does not affect the image much

Insert picture description here

Two 、CNN framework

First input a sheet image, This piece of image Will pass convolution layer, Next is max pooling, And then do it convolution, Do it again max pooling.

The above process can be repeated countless times （ How many times you have to decide in advance , It is network The architecture of （ It's like neural There are several layers ）, How many layers should be made convolution, Make several layers Max Pooling, Ding Ding neural When it comes to architecture , It must be decided in advance ）.

Finish what you want to do convolution and Max Pooling in the future , It's going to be flatten, And then flatten Of output Throw it to the general fully connected feedforward network, Then get the result of image recognition .

Insert picture description here

▲ CNN framework

3、 ... and 、 Convolution （Convolution）

first filter It's a 3* 3 Of matrix, Put this filter Put it in image Top left corner of , hold filter Of 9 Value and image Of 9 Inner product of two values , On both sides 1,1,1（ Diagonally ）, The result of inner product is 3.

（ How much to move is predetermined ） The distance of movement is called stride（stride The value of can also be set in advance ）, The inner product is equal to -1. In the picture stride be equal to 1.

Insert picture description here

▲ how Convolution

Four 、Convolution and Fully Connected The relationship between

take stride=1（ Move one space ） Do the inner product to get another value -1, Suppose this -1 It's another one neural Of output, This neural Connect to input Of （2,3,4,8,9,10,14,15,16）, alike weight Represents the same color .

stay fully connect Two of them neural I had my own weight, When we are doing convolution when , First, put each neural Connected wight Reduce , Force these two neural Share one weight. This is called shared weight, When we do this , This parameter we use is less than the original one .

Insert picture description here

▲ Convolution and Fully Connected The relationship between

5、 ... and 、 Maximum pooling （Max Pooling）

be relative to convolution Come on ,Max Pooling It's relatively simple . We according to the filter 1 obtain 4*4 Of maxtrix, according to filter2 Get another 4 *4 Of matrix, Next, put output ,4 In groups . The average of each group can be selected （ The average pooling ） Or choose the largest （ Maximum pooling ） All of them can , Is to put four value Synthesis of a value. This can make image narrow .

The essential function of pooling is ： Zoom out the image , Reduce features .

Insert picture description here

▲ Max Pooling

6、 ... and 、 Flatten （Flatten）

Turn multidimensional input into one dimension , It is often used in the transition from convolution layer to fully connected layer .

flatten Namely feature map straighten , After straightening, you can throw it to fully connected feedforward netwwork.

Insert picture description here

▲ Flatten

7、 ... and 、To learn more …

The methods of visualization in these slides
https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
More about visualization
http://cs231n.github.io/understanding-cnn/
Very cool CNN visualization toolkit
http://yosinski.com/deepvis
http://scs.ryerson.ca/~aharley/vis/conv/

How to let machine draw an image

PixelRNN：https://arxiv.org/abs/1601.06759
Variation Autoencoder (VAE)： https://arxiv.org/abs/1312.6114
Generative Adversarial Network (GAN)： http://arxiv.org/abs/1406.2661

8、 ... and 、 summary

Datawhale Team learning , Li Hongyi 《 machine learning 》Task6. Convolutional Neural Network（ Convolutional neural networks ）. Including why CNN For image processing 、CNN framework 、 Convolution （Convolution）、Convolution and Fully Connected The relationship between 、 Maximum pooling （Max Pooling）、 Flatten （Flatten） And other references .

It mainly introduces the principle part , In practice, one line of code may realize the corresponding functions . For beginners , I think there are many places that can be understood , Wait until the specific application time to dig deeper , This can improve efficiency .

—— END ——

If any of the above is inaccurate , Welcome below Leaving a message. . Or you have a better idea , Welcome to exchange and study together ~~~

For more, please go to AXYZdong The blog of

原网站

版权声明
本文为[AXYZdong]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/176/202206242300454867.html