当前位置:网站首页>Learning rate adjustment strategy in deep learning (1)
Learning rate adjustment strategy in deep learning (1)
2022-07-24 14:39:00 【GIS and climate】

Learning rate (LearningRate, LR/lr) It is a very important super parameter in deep learning . Its formula :
In other words, it is a way to update the network weight in the training process Adjustment factor , Why is it important ? In short :
Learning rate is too high , Gradients explode easily ,loss The amplitude of is large , The model is difficult to converge ; The learning rate is too low , Easy to overfit , It is also easy to fall into “ Local optimum ” spot ;
Therefore, it is very important to choose an appropriate learning rate . For starters , Generally, you may choose a similar one based on online experience or open source code lr( such as 0.1-0.001 Between ).
however , When you really use your own data to debug the model, you will find , Learning rate is also a very important super parameter , And it's not so easy to be sure ...
Understand the difficulty of refining pills by the supreme Lao Jun .

But also good , Some bigwigs have come up with ways to dynamically adjust the learning rate , The principle is very simple : According to some strategy , Dynamically adjust the learning rate in the process of model training , It is generally attenuated according to some strategy ( You can imagine slowing down when you are about to reach the bottom or peak ).
Learning rate adjustment strategies
The learning rate adjustment strategy is pytorch Of torch.optim Under module , Call it scheduler, So it can also be said that it is still Part of the optimizer . The learning rate is generally adjusted after the optimizer updates , Its sample code ( From official website ):
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = ExponentialLR(optimizer, gamma=0.9)
for epoch in range(20):
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
scheduler.step()
Pay attention to the code above , Its presence epoch In the cycle of , Not in the innermost batch In circulation , Because I usually train a few epoch Then adjust the learning rate , If it's in batch in ,lr The update is too fast ;
Adjustment of learning rate ,Pytorch The following are provided in 14 Kind of Method ( See the reference link for details 【3】):
lr_scheduler.LambdaLR lr_scheduler.MultiplicativeLR lr_scheduler.StepLR lr_scheduler.MultiStepLR lr_scheduler.ConstantLR lr_scheduler.LinearLR lr_scheduler.ExponentialLR lr_scheduler.CosineAnnealingLR lr_scheduler.ChainedScheduler lr_scheduler.SequentialLR lr_scheduler.ReduceLROnPlateau lr_scheduler.CyclicLR lr_scheduler.OneCycleLR lr_scheduler.CosineAnnealingWarmRestarts
The specific usage of each method will be discussed later , Let's first look at the next example :
model = torchvision.models.AlexNet(num_classes=2)
optimizer = optim.Adam(model.parameters(),lr=0.01)
scheduler = optim.lr_scheduler.LinearLR(optimizer,start_factor=0.1, total_iters=100)
for epoch in range(100):
print(f" Current learning rate :{optimizer.param_groups[0]['lr']}")
optimizer.step()
scheduler.step()
The above example uses Adam As an optimizer , Then update the learning rate in the process of training in a linear way ;
Its learning rate changes as follows :

You can see LinearLR Our strategy is to set the initial learning rate ( Learning rate in optimizer start_factor) And the rate of terminated learning ( The default is the learning rate in the optimizer end_factor,end_factor The default is 1.0), then according to total_iters Divide the interval determined by the starting learning rate and the ending learning rate equally , Then each epoch Updated once . It should be noted that , When the set termination learning rate is reached , Even if you haven't finished training , The learning rate will not be updated .
If we set inappropriate parameters , As a result, the learning rate will soon be updated to the end , such as 10 individual epoch It will be updated , But the total training is 100 individual epoch What do I do ? Don't panic ,Pytorch The learning rate update in can be scheduled in a chain , In other words, multiple learning rate update strategies can be used at the same time ! Example :
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = ExponentialLR(optimizer, gamma=0.9)
scheduler2 = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
for epoch in range(20):
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
scheduler1.step()
scheduler2.step()
In other words, we can use multiple strategies to update the learning rate at the same time , For example, each training more epoch Updated once +loss Take the initiative to update when there is no change , wait ..
The next article explains in detail .

Reference resources
【1】https://zhuanlan.zhihu.com/p/41681558
【2】https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html
【3】https://pytorch.org/docs/stable/optim.html
【4】https://hasty.ai/content-hub/mp-wiki/scheduler/cycliclr
边栏推荐
- The fourth edition of Zhejiang University probability proves that the uncorrelation of normal distribution random variables is equivalent to independence
- Where can Huatai Securities open an account? Is it safe to use a mobile phone
- Regular expression and bypass cases
- 关于构建网络安全知识库方向相关知识的学习和思考
- 老虎口瀑布:铜梁版小壶口瀑布
- Detailed explanation of IO model (easy to understand)
- Video game design report template and resources over the years
- 正则表达和绕过案例
- [oauth2] II. Authorization method of oauth2
- 【机器学习】之 主成分分析PCA
猜你喜欢

Number of bytes occupied by variables of type char short int in memory

看完这篇文章,才发现我的测试用例写的就是垃圾

bibliometrix: 从千万篇论文中挖掘出最值得读的那一篇!

VSCode如何调试Nodejs

Learning and thinking about the relevant knowledge in the direction of building network security knowledge base

Similarities and differences between nor flash and NAND flash

Maotai ice cream "bucked the trend" and became popular, but its cross-border meaning was not "selling ice cream"

达梦实时主备集群搭建

IEEE Transaction期刊模板使用注意事项

北京一卡通以35288.8529万元挂牌出让68.45%股权,溢价率为84%
随机推荐
【机器学习】之 主成分分析PCA
深入浅出边缘云 | 2. 架构
Differences between C language pointer and array A and &a, &a[0], etc
spark:指定日期输出相应日期的日志(入门级-简单实现)
Can you buy 6% of financial products after opening a stock account?
[oauth2] III. interpretation of oauth2 configuration
[oauth2] II. Authorization method of oauth2
股票开户之后就可以购买6%的理财产品了?
VS编译后的应用缺少dll
Binlog and iptables prevent nmap scanning, xtrabackup full + incremental backup, and the relationship between redlog and binlog
Atcoder beginer contest 261 f / / tree array
Stack and queue - 20. Valid parentheses
Number of bytes occupied by variables of type char short int in memory
C unsafe unmanaged object pointer conversion
String - Sword finger offer 58 - ii Rotate string left
2022年IAA行业品类发展洞察系列报告·第二期
Video game design report template and
Learn science minimize
PIP source switching
The server switches between different CONDA environments and views various user processes