当前位置:网站首页>[deep learning series] - visual interpretation of neural network
[deep learning series] - visual interpretation of neural network
2022-06-25 20:38:00 【SophiaCV】
This is the third article in the deep learning series , Welcome to the original official account 【 The computer vision Alliance 】, Read my original works for the first time ! reply 【 Watermelon Book hand push notes 】 You can also get my pure manual notes of machine learning !
Deep learning series
【 Deep learning series 】—— Introduction to deep learning
【 Deep learning series 】—— Visual interpretation of gradient descent algorithm ( momentum ,AdaGrad,RMSProp,Adam)
Link to the original text :https://medium.com/swlh/an-intuitive-visual-interpretability-for-convolutional-neural-networks-9630007c5857
translate :【 The computer vision Alliance 】 Public account team
Recommend resources
Machine learning push notes (https://github.com/Sophia-11/Machine-Learning-Notes, Direct note address )
Note Preview
The first convolutional neural network is Alexander Waibel stay 1987 The neural network with time delay was proposed (TDNN)[5].TDNN It is a convolutional neural network applied to speech recognition . It USES FFT Preprocess speech signal as input . Its hidden layer consists of two one-dimensional convolution kernels , To extract translation invariant features in frequency domain [6]. stay TDNN Before appearance , The field of artificial intelligence is back-propagation (BP) A breakthrough has been made in the research [7], therefore TDNN Able to use BP Framework for learning . In the original author's comparative experiment , Under the same conditions ,TDNN Is better than hidden Markov model (HMM), The latter is 1980 The mainstream speech recognition algorithms in the s [6].
1988 year , Zhang Wei proposed the first two-dimensional convolutional neural network to transform invariant artificial neural network (SIANN), And it is applied to the detection of medical images [1].Yann LeCun still 1989 year , A convolution neural network is constructed for computer vision problems [2], namely LeNet In the original version of .LeNet It contains two convolutions , Two fully connected layers , in total 60,000 Learning parameters , It's much bigger than TDNN and SIANN, Its structure is very close to modern convolutional neural network [4].LeCun(1989) use [2] Stochastic gradient descent (SGD) Carry out weight learning after random initialization . Later, the deep learning institute retained this strategy . Besides ,LeCun(1989) In discussing its network structure [2] The word convolution is used for the first time , And named convolutional neural network .
For deep convolutional neural networks , After many convolutions and merges , The last convolution layer contains the most abundant Spatial and semantic information . Each convolution unit in convolution neural network actually acts as an object detector , It has the ability to locate objects, but the information contained in it is difficult for humans to understand , And it is difficult to visually display .
In this paper , We'll review class activation maps (CAM),CAM Drawing on the famous papers 《 Network in network 》(Network In Network) The idea in , Using the global average pool (GAP) Instead of the full connection layer .
The proposed CNN The network has powerful image processing and classification functions , At the same time, it can also locate the key parts of the picture .
Convolution layer (Convolution Layers)
Convolutional neural networks (CNN) , It is mainly to extract features continuously through a single filter , From local features to overall features , For image recognition and other functions .
Suppose we need to deal with a dimension of 6x6 A single channel grayscale image of pixels , Convert it to a 2D matrix , As shown below :
source :https : //mc.ai/my-machine-learning-diary-day-68/
The number in the picture represents the pixel value of the location , The larger the pixel value , The brighter the color . The dividing line between the two colors in the middle of the picture is the boundary we want to detect .
We can design a filter ( Also known as kernel) To detect the boundary . then , This filter is combined with the input image to extract edge information , The convolution operation on the picture can be simplified to the following animation :
source :https : //mc.ai/my-machine-learning-diary-day-68/
We use this filter to overwrite the image , Cover an area as large as the filter , Multiply the corresponding elements , Then sum it . After calculating a region , Move to another area , Then calculate until all areas of the original image are covered .
The output matrix is called a characteristic graph (Feature Map), It has a lighter color in the middle , Darker colors on both sides , It reflects the boundary in the middle of the original image .
source :https : //mc.ai/learning-to-perform-linear-filtering-using-natural-image-data/
The convolution layer mainly consists of two parts , A filter and a characteristic graph , This is the data flow CNN The first neural layer of the network , The more filters you learn to use , Will automatically adjust CNN Filter matrix , Will get more features .
The general super parameters to be set include Number of filters , Size and step size .
Pooling layer (Pooling)
Pooling is also called spatial pooling or subsampling . Its main function is to extract the main features of a specific region and reduce the number of parameters , To prevent the model from over fitting .
There are no parameters we need to learn . The specified super parameter package is required Include pool type , Common methods include Maxpooling or Averagepooling, Window size and step size . Usually , We use more Maxpooling, And the size is usually (2,2), In steps of 2 Filter , So after the merger , Input length and width will be reduced 2 times , And the channel will not change , As shown in the figure below :
The maximum value is obtained in the merge window , A new matrix is generated by merging the characteristic graph matrix in sequence . Again , We can also use the method of averaging or summing , But in general , The maximum method is relatively better .
After several convolutions and merges , We finally flatten the multidimensional data into a one-dimensional array , Then connect them to the fully connected layer .
source :https : //gfycat.com/fr/smoggylittleflickertailsquirrel-machine-learning-neural-networks-mnist
Its main function is to classify the processed images based on the feature set extracted from the convolution layer and pooling layer .
Such as GoogleNet [10] Such full convolution neural networks avoid the use of full connection layer , Use the global average pool (GAP). such , Not only can parameters be reduced to avoid over fitting , And you can create a feature map associated with a category .
Global average pooling layer (Global Average Pooling)
For a long time , A fully connected network has always been CNN The standard structure of classification networks . Usually , When fully connected, it will have the activation function for classification . But a fully connected layer has a large number of parameters , This will slow down the training speed , And easy to fit .
A network in a network [9] in , The concept of global average pool is proposed to replace the fully connected layer .
source :http : //www.programmersought.com/article/1768159517/
The difference between a global average pool and a local average pool is the pool window . Local average pooling includes averaging the sub regions in the characteristic graph , In the global average pool , We average the entire characteristic graph .
source :https : //www.machinecurve.com/index.php/2020/01/30/what-are-max-pooling-average-pooling-global-max-pooling-and-global-average-pooling/
Using a global average pool instead of a fully connected layer greatly reduces the number of parameters .
Class activation graph (Class Activation Map)
When using global average pooling , The final convolution layer is forced to generate the same number of feature graphs as the number of categories we are targeting , This will give a very clear meaning to each feature map , That is, category confidence graph [11].
source :https : //medium.com/@ahmdtaha/learning-deep-features-for-discriminative-localization-aa73e32e39b2
As you can see from the diagram , stay GAP after , We get the average value of each characteristic graph of the last convolution layer , The output is obtained by weighted sum . For each category C, Each feature map k The average value of has the corresponding weight w.
Training CNN After the model , We can get a heat map to explain the classification results . for example , If we want to explain C Class classification results . We take out and class C All corresponding weights , And find the weighted sum of their corresponding characteristic graphs . Because the size of this result is consistent with the feature map , So we need to oversample it and overlay it on the original image , As shown below :
source :https : //medium.com/@ahmdtaha/learning-deep-features-for-discriminative-localization-aa73e32e39b2
In this way ,CAM In the form of a heat map , The model focuses on the c Class .
source :MultiCAM: Multi class activation mapping for aircraft recognition in remote sensing images
Conclusion
CAM The explanation effect of has been very good , But there is a drawback , That is, it needs to modify the structure of the original model , This leads to the need to retrain the model , This greatly limits its usage scenarios . If the model is already online , Or the training cost is very high , Then it is almost impossible for us to retrain them .
Have a harvest? ? Let's pay attention and praise , Let more people see this article
- give the thumbs-up , Let more people see this article article
- First public official account 【 The computer vision Alliance 】, Read the article for the first time , reply 【 Watermelon Book hand push notes 】 obtain PDF download !
- Welcome to my blog , Let's learn and improve together !
References
- Zhang, W., 1988. Shift-invariant pattern recognition neural network and its optical architecture. In Proceedings of annual conference of the Japan Society of Applied Physics.
- . LeCun, Y. and Bengio, Y., 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995.
- LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W. and Jackel, L.D., 1989. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), pp.541–551.
- LeCun, Y., Kavukcuoglu, K. and Farabet, C., 2010. Convolutional networks and applications in vision. In ISCAS(Vol. 2010, pp. 253–256).
- Waibel, A., 1987. Phoneme recognition using time-delay neural networks. Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE). Tokyo, Japan.
- Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang, K., 1989. Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), pp. 328–339.
- Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 1986. Learning representations by back-propagating errors. nature, 323(6088), p.533.
- LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278–2324.
- Min Lin, Qiang Chen, Shuicheng Yan : Network In Network.
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich: Going Deeper with Convolutions.
- Bolei Zhou andAditya Khosla and Agata Lapedriza andAude Oliva andAntonio Torralba :Learning Deep Features for Discriminative Localization
边栏推荐
- very good
- Local variables and global variables in C language
- MySQL installation tutorial
- Global netizens Yuanxiao created a picture of appreciating the moon together to experience the creativity of Baidu Wenxin big model aigc
- 8 minutes to understand the wal mechanism of tdengine
- hashlib. Md5() function to filter out duplicate system files and remove them
- 206. reverse linked list (insert, iteration and recursion)
- MySQL lock
- Is it safe to open an account with a mobile phone? Where can I open an account to buy shares?
- E-commerce project environment construction
猜你喜欢
Barrier of cursor application scenario
Huawei HMS core launched a new member conversion & retention prediction model
JS canvas drawing an arrow with two hearts
String since I can perform performance tuning, I can call an expert directly
II Traits (extractors)
App battery historian master
This is a simple and cool way to make large screen chart linkage. Smartbi will teach you
Introduction to the basics of kotlin language: lambda expression
[harmonyos] [arkui] how can Hongmeng ETS call pa
H5 application conversion fast application
随机推荐
Chrome plugin installation
Barrier of cursor application scenario
Yanjiehua, editor in chief of Business Review: how to view the management trend of business in the future?
Install and initialize MySQL (under Windows)
How to close gracefully after using jedis
6. exception handling
Cloud native 04: use envoy + open policy agent as the pre agent
Expand and check the specified node when loading ztree
Several ways to obtain domain administrator privileges
[harmonyos] [arkui] how can Hongmeng ETS call pa
One picture to achieve the selected effect
ZK implementation of distributed global counter for cursor application scenario analysis
Understanding C language structure pointer
CiteSpace download installation tutorial
Huawei fast application access advertising service development guide
An unusual interview question: why doesn't the database connection pool adopt IO multiplexing?
R language quantile autoregressive QAR analysis pain index: time series of unemployment rate and inflation rate
Day 28/100 CI CD basic introductory concepts
The latest promo! 1 minute to understand the charm of the next generation data platform
Does redis transaction support acid?