当前位置：网站首页>Learning rate adjustment strategy in deep learning (1)

Learning rate adjustment strategy in deep learning (1)

2022-07-24 14:39:00 【GIS and climate】

Learning rate (LearningRate, LR/lr) It is a very important super parameter in deep learning . Its formula ：

In other words, it is a way to update the network weight in the training process Adjustment factor , Why is it important ？ In short ：

Learning rate is too high , Gradients explode easily ,loss The amplitude of is large , The model is difficult to converge ;
The learning rate is too low , Easy to overfit , It is also easy to fall into “ Local optimum ” spot ;

Therefore, it is very important to choose an appropriate learning rate . For starters , Generally, you may choose a similar one based on online experience or open source code lr（ such as 0.1-0.001 Between ）.

however , When you really use your own data to debug the model, you will find , Learning rate is also a very important super parameter , And it's not so easy to be sure ...

Understand the difficulty of refining pills by the supreme Lao Jun .

But also good , Some bigwigs have come up with ways to dynamically adjust the learning rate , The principle is very simple ： According to some strategy , Dynamically adjust the learning rate in the process of model training , It is generally attenuated according to some strategy （ You can imagine slowing down when you are about to reach the bottom or peak ）.

Learning rate adjustment strategies

The learning rate adjustment strategy is pytorch Of torch.optim Under module , Call it scheduler, So it can also be said that it is still Part of the optimizer . The learning rate is generally adjusted after the optimizer updates , Its sample code （ From official website ）：

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()

Pay attention to the code above , Its presence epoch In the cycle of , Not in the innermost batch In circulation , Because I usually train a few epoch Then adjust the learning rate , If it's in batch in ,lr The update is too fast ;

Adjustment of learning rate ,Pytorch The following are provided in 14 Kind of Method （ See the reference link for details 【3】）：

lr_scheduler.LambdaLR
lr_scheduler.MultiplicativeLR
lr_scheduler.StepLR
lr_scheduler.MultiStepLR
lr_scheduler.ConstantLR
lr_scheduler.LinearLR
lr_scheduler.ExponentialLR
lr_scheduler.CosineAnnealingLR
lr_scheduler.ChainedScheduler
lr_scheduler.SequentialLR
lr_scheduler.ReduceLROnPlateau
lr_scheduler.CyclicLR
lr_scheduler.OneCycleLR
lr_scheduler.CosineAnnealingWarmRestarts

The specific usage of each method will be discussed later , Let's first look at the next example ：

model = torchvision.models.AlexNet(num_classes=2)
optimizer = optim.Adam(model.parameters(),lr=0.01)
scheduler = optim.lr_scheduler.LinearLR(optimizer,start_factor=0.1, total_iters=100)
for epoch in range(100):
    print(f" Current learning rate ：{optimizer.param_groups[0]['lr']}")
    optimizer.step()
    scheduler.step()

The above example uses Adam As an optimizer , Then update the learning rate in the process of training in a linear way ;

Its learning rate changes as follows ：

You can see LinearLR Our strategy is to set the initial learning rate （ Learning rate in optimizer start_factor） And the rate of terminated learning （ The default is the learning rate in the optimizer end_factor,end_factor The default is 1.0）, then according to total_iters Divide the interval determined by the starting learning rate and the ending learning rate equally , Then each epoch Updated once .
It should be noted that , When the set termination learning rate is reached , Even if you haven't finished training , The learning rate will not be updated .

If we set inappropriate parameters , As a result, the learning rate will soon be updated to the end , such as 10 individual epoch It will be updated , But the total training is 100 individual epoch What do I do ？ Don't panic ,Pytorch The learning rate update in can be scheduled in a chain , In other words, multiple learning rate update strategies can be used at the same time ！ Example ：

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = ExponentialLR(optimizer, gamma=0.9)
scheduler2 = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)

for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler1.step()
    scheduler2.step()

In other words, we can use multiple strategies to update the learning rate at the same time , For example, each training more epoch Updated once +loss Take the initiative to update when there is no change , wait ..

The next article explains in detail .

Reference resources

【1】https://zhuanlan.zhihu.com/p/41681558
【2】https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html
【3】https://pytorch.org/docs/stable/optim.html
【4】https://hasty.ai/content-hub/mp-wiki/scheduler/cycliclr

原网站

版权声明
本文为[GIS and climate]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207241431168131.html

当前位置：网站首页>Learning rate adjustment strategy in deep learning (1)

Learning rate adjustment strategy in deep learning (1)

Learning rate adjustment strategies

Reference resources

边栏推荐

猜你喜欢

随机推荐