当前位置:网站首页>[2022 freshmen learning] key points of the second week

[2022 freshmen learning] key points of the second week

2022-07-23 16:28:00 AI frontier theory group @ouc

Video learning content discussion

1、 Why do visual tasks use Convolutional Neural Networks ?

There are too many parameters in the weight matrix of the traditional fully connected network , Easy to overfit ;CNN It can be associated locally 、 Parameter sharing to solve this problem .**( however , Model unification is the development trend , lately Transformer The model is quite popular , Can be unified NLP、CV、 Internet and other fields )**

2、 Parameter calculation of convolution ?

Formula for : k × k × C i n × C o u t k\times k\times C_{in} \times C_{out} k×k×Cin×Cout , Pay attention to this and feature map It doesn't matter the size of . Suppose the input layer is a size of 64×64 Pixels 、 Three channel color picture , Request output 4 individual Feature Map, And the size is the same as the input layer . The whole process can be described by the following figure :

The convolution layer consists of 4 individual Filter, Every Filter Contains 3 individual Kernel, Every Kernel The size is 3×3. Therefore, the parameter quantity of convolution can be calculated by the following formula :3x3x3x4=108.

3、 What do the lower and higher layers of neural networks learn respectively ?

Low level networks pay more attention to details , High level networks pay more attention to semantic concepts . meanwhile , Different convolution kernels focus on different things .

With academic stories : 2012 year ,AlexNet Great success , Companies are paying high prices for this technology .Hinton After consulting someone , Set up a company without products DNNResearch, There are only three employees . then , On 2012 year 12 A auction was organized in August , Google 、 Baidu 、 Microsoft and DeepMind Participated in the auction . In the beginning, it was 100 Million dollars increase , Later, it was completely uncontrollable , The final price is 4400 Thousands of dollars . Interested can baidu 《 The year of the rise of deep learning , Baidu almost signed Hinton》.

4、CNN The development of

2018 In the special video recorded in, only AlexNet, VGG, GoogleNet, ResNet, The latest model is not included , below PPT From Tsinghua Huang Gao 2022 Latest report in 《 Research frontier and discussion of visual backbone model 》, That's a very good summary , You can learn the models mentioned by yourself .

## 5、AlexNet stay AlexNet before ,SVM It is the mainstream of the whole field , Neural networks have been silent for a long time .2012 year ,AlexNet obtain ImageNet The champion of the competition , Far more than the second place near 10 A little bit , The main reason is : - Big data training : Million level ImageNet Image data - Nonlinear activation function :ReLU - Prevent over fitting :Dropout, Data augmentation - Calculate the force : double GPU Realization
## 6、VGG The Internet VGG The Internet is getting deeper , from AlextNet Of 8 The layer becomes 16 / 19 layer , yes 2014 year ImageNet Second place in the competition . The network parameter quantity is AlexNet Of 2 times , Mainly focus on the last FC layer . It is very difficult to train such a deep network , Because there was no BatchNorm technology .VGG The network has great influence , Up to now CVPR It can still be seen in the paper waiting for the summit .

7、GoogleNet

2014 year ImageNet The champion of the competition , Error rate from 2013 Year of 11.7 Down to 6.7. The network contains 22 layer , The number of parameters is greatly reduced , No, FC layer .GoogleNet Yes V1、V2、V3、V4 Equal Edition , The core of the design is Inception modular (GoogleNetV1), The core idea is the parallel convolution kernel of multiple scales . among ,3X3 The branch parameter quantity of is 3x3x256x192=442,368, There is still room for optimization of parameter quantities .( The role of auxiliary classifier ? Some are like layer by layer pre training )

Google researchers also found this problem , therefore , Proposed in the second year GoogleNetV2 in , Join in 1x1 The convolution of feature map drop to 64, Solve the problem of excessive parameters . such ,3X3 The parameter quantity of the branch is 1x1x256x64+3x3x64x192 = 126, 976.

To GoogleNetV3, The main improvement is 5x5 The convolution of is decomposed into two 3x3.( The parameters are from 25 drop to 18)

8、ResNet

2015 year ImageNet The champion of the competition , Error rate from 6.7 Down to 3.57%, The Internet has 152 layer . The team from China has reached the peak of artificial intelligence for the first time , get CVPR2016 Of Best Paper. The related introduction of author he Kaiming can be found in the article 《AI genius 》, The story of young genius continues .

Residual thought : Remove the same main part , So as to focus on learning small changes , It can be used to train very deep Networks .

Code practice content discussion

1、 about MNIST data classification , Why disturb the order of pixels , The effect is very poor ? Why can fully connected networks still be classified relatively accurately ?

## 2、dataloader Inside shuffle What's the difference between taking different values ? Because data sets need to run multiple rounds , Every round is a epoch. In order to ensure the diversity of each training data , Need shuffle. therefore , Training requires shuffle, But the test is not needed .

3、transform in , Different values are taken , What's the difference between this ?

It mainly plays the role of data normalization , The impact may not be great . Someone in code 2 is right CIFAR10 The data of , This value can normalize the data set to a mean of 0, The variance of 1.

4、epoch and batch The difference between ?

5、1x1 Convolution sum of FC What's the difference? ? What role does it play ?

In essence, it is the full connection layer on the channel , Generally, it plays the role of feature dimension reduction , Reduce the amount of network parameters .

6、Residual leanring Why can we improve the accuracy ?

This is a classic question , There are many answers on the Internet , Let's have a look for ourselves .UNet In the structure skip connection It's a principle .

7、 Code exercise 2 , The Internet and 1989 year Lecun Proposed LeNet What's the difference? ?

The size of the data is different ,CIFAR10 yes 32x32, but MNIST yes 28x28. The activation function is different ,LeNet It uses sigmoid Activation function , But what we use in our code is ReLU.

8、 Code exercise 2 , After convolution feature map It's going to get smaller , How to apply Residual Learning?

class Bottleneck(nn.Module):
  def __init__(self, in_planes=256, planes=64):
    super(Bottleneck, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1)
    self.bn2 = nn.BatchNorm2d(planes)
    self.conv3 = nn.Conv2d(planes, in_planes, kernel_size=1)
    self.bn3 = nn.BatchNorm2d(in_planes)
  
  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = F.relu(self.bn2(self.conv2(out)))
    out = self.bn3(self.conv3(out))
    out = out + x
    out = F.relu(out)
    return out

9、 What methods can further improve the accuracy ?

If you're interested , Recommend students to have a look self-attention(SENet,CBAM,ECANet etc. ) Related papers .

原网站

版权声明
本文为[AI frontier theory group @ouc]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/204/202207231210181192.html