当前位置:网站首页>Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
2022-06-28 16:46:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
BN principle 、 effect :
Function parameters :
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
1.num_features: The general input parameter is batch_sizenum_featuresheight*width, That is, the number of features , This is the input BN The number of channels in the layer ; 2.eps: A value added to the denominator , The purpose is to calculate the stability of , The default is :1e-5, Avoid denominator as 0; 3.momentum: An estimation parameter for the mean and variance in the operation process ( My understanding is a stability coefficient , Be similar to SGD Medium momentum The coefficient of ); 4.affine: When set to true when , The coefficient matrix that can be learned will be given gamma and beta Generally speaking pytorch The models in are inherited nn.Module Class , All have a property trainning Specify whether it is training status , The training status will affect whether the parameters of some layers are fixed , such as BN Layer or Dropout layer . Usually use model.train() Specify current model model In training status ,model.eval() Specify that the current model is in test state . meanwhile ,BN Of API There are several parameters to be concerned about , One is affine Specifies whether affine , Another is track_running_stats Specifies whether to track the current batch The statistical characteristics of . These three parameters are also prone to problems :trainning,affine,track_running_stats. Among them affine Specifies whether affine , That is, whether we need the fourth of the above formula , If affine=False be γ=1,β=0, And can't learn to be updated . It is usually set to affine=True. trainning and track_running_stats,track_running_stats=True It means to track the whole training process batch The statistical characteristics of , Get variance and mean , Instead of just relying on the current input batch The statistical characteristics of . Contrary , If track_running_stats=False Then it just calculates the current input batch The mean and variance in the statistical properties of . When in the reasoning stage , If track_running_stats=False, If at this time batch_size The relatively small , Then its statistical characteristics will deviate greatly from the global statistical characteristics , May lead to bad results . If BatchNorm2d Parameters of track_running_stats Set up False, After loading the pre training, the results of each model test set are different ;track_running_stats Set to True when , The result is the same every time . running_mean and running_var Parameters are based on input batch The statistical properties of , Not exactly “ Study ” Parameters to , But it is very important for the whole calculation .BN Layer. running_mean and running_var The update to forward During operation , Not in optimizer.step() In the , So if you are in training , Even if it is not done manually step(),BN The statistical properties of will also change .
model.train() # In training state
for data , label in self.dataloader:
pred =model(data) # It will be updated here model Medium BN Statistical characteristic parameters ,running_mean,running_var
loss=self.loss(pred,label)
# Even if you don't do the following three lines ,BN The statistical characteristic parameters of will also change
opt.zero_grad()
loss.backward()
opt.step()
This is the time , Use model.eval() Go to the testing phase , Can be fixed running_mean and running_var, Sometimes, if the model is pre trained and then loaded , When rerunning the test data , The results are different , There is a little loss of performance , This time is basically training and track_running_stats Wrong settings . If two models are used for joint training , To make convergence easier to control , First, pre train the model model_A, also model_A There are also several BN layer , In the future, we need to model_A As a inference Reasoning model and model_B Joint training , Hope at this time model_A Medium BN Statistical characteristic quantity of running_mean and running_var No random changes , So we need to put model_A.eval() Set to test model , Otherwise, in the trainning In mode , Even if you don't update the parameters of the model , Its BN Will change , This will lead to different results than expected .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/132951.html Link to the original text :https://javaforall.cn
边栏推荐
猜你喜欢
O & M - unified gateway is very necessary
【TcaplusDB知识库】TcaplusDB限制条件介绍
抓取手机端变体组合思路设想
After the first failure, AMEC rushed to the Hong Kong stock exchange for the second time, and the financial principal changed frequently
visio 使用
Knowing these commands allows you to master shell's own tools
Noip popularization group 2006-2018 preliminary round 2019 csp-j1 2020 csp-j1 improvement program
You have a chance to collect wool. Click "earn" and you will have a chance to earn a high commission
运维-- 统一网关非常必要
The future of platform as code is kubernetes extension
随机推荐
What you have to know under the digital collection boom
Stm32cubemx usage and function introduction
通过setTimeout解决子组件不会销毁的问题
Knowing these commands allows you to master shell's own tools
visio 使用
Steps to be taken for successful migration to the cloud
如何清除 WordPress 中的缓存
论文解读(GCC)《Efficient Graph Convolution for Joint Node RepresentationLearning and Clustering》
Convolutional neural network for machine learning uses cifar10 data set and alexnet network model to train classification model, install labelimg, and report error
PID控制详解[通俗易懂]
浅谈 SAP 软件里的价格折扣设计原理
CRM 全栈开发工具 WebClient UI Workbench 的设计细节介绍
[force button] 35 Search insert location
云上竞技,360°见证速度与激情
The first place on the list - brake by wire "new cycle", the market competitiveness of local suppliers is TOP10
【Redis】2021/01/31 Redis的简单归纳 No.01
Yesterday, metauniverse | Wal Mart set up an innovation department to explore metauniverse and Web3, and Dior released the metauniverse Exhibition
C#/VB.NET 将PDF转为Excel
【Hot100】3. Longest substring without duplicate characters
Have you ever encountered the error that the main key of this setting is consistent with the database?