当前位置:网站首页>Pit encountered by pytorch: why can't l1loss decrease during model training?
Pit encountered by pytorch: why can't l1loss decrease during model training?
2022-06-25 07:39:00 【The struggle of a rookie】
Recently in use L1loss Do a regression model training , During model training loss It's extremely unstable , And the training effect is very poor , Finally, I found the reason !
The original code is as follows :
criterion = nn.L1Loss()
def train():
print('Epoch {}:'.format(epoch + 1))
model.train()
# switch to train mode
for i, sample_batched in enumerate(train_dataloader):
input, target = sample_batched['geno'], sample_batched['pheno']
# compute output
output = model(input.float().cuda())
loss = criterion(output, target.float().cuda())
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()The problem with the above code lies in :
loss = criterion(output, target.float().cuda())
I input batchsize yes 4, therefore output Of size yes [4,1], That is, a two-dimensional data ;target Of size yes [4].loss The output is a correct value . That's why I didn't find the problem ! Let's see pytorch garage l1_loss Code for :
def l1_loss(input, target, size_average=None, reduce=None, reduction='mean'):
# type: (Tensor, Tensor, Optional[bool], Optional[bool], str) -> Tensor
r"""l1_loss(input, target, size_average=None, reduce=None, reduction='mean') -> Tensor
Function that takes the mean element-wise absolute value difference.
See :class:`~torch.nn.L1Loss` for details.
"""
if not torch.jit.is_scripting():
tens_ops = (input, target)
if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
return handle_torch_function(
l1_loss, tens_ops, input, target, size_average=size_average, reduce=reduce,
reduction=reduction)
if not (target.size() == input.size()):
warnings.warn("Using a target size ({}) that is different to the input size ({}). "
"This will likely lead to incorrect results due to broadcasting. "
"Please ensure they have the same size.".format(target.size(), input.size()),
stacklevel=2)
if size_average is not None or reduce is not None:
reduction = _Reduction.legacy_get_string(size_average, reduce)
if target.requires_grad:
ret = torch.abs(input - target)
if reduction != 'none':
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
else:
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
return ret
In code warning, requirement input and target Of size It has to be consistent , Otherwise, there will be wrong results . I put... In my own code warning to ignore 了 , So this warning I haven't seen ! Here's a reminder , Don't be casual ignore warning, And take a good look warning, Don't just watch error....
I changed the code to the following , There is no problem :
loss = criterion(output.squeeze(), target.float().cuda())
Now that the problem is solved , You need to know why size Mismatches can lead to model errors , Otherwise, I have been looking for it for so long bug I am not blind in vain = =
Let's try wrong typing first , Input size yes [4,1],target Of size yes [4]:
input = tensor([[-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0', grad_fn=<PermuteBackward>) target = tensor([ 63.6000, 127.0000, 102.2000, 115.4000], device='cuda:0') expanded_input, expanded_target = torch.broadcast_tensors(input, target) ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
return expanded_input:
tensor([[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0',
grad_fn=<PermuteBackward>)
return expanded_target:
tensor([[ 63.6000, 63.6000, 63.6000, 63.6000],
[127.0000, 127.0000, 127.0000, 127.0000],
[102.2000, 102.2000, 102.2000, 102.2000],
[115.4000, 115.4000, 115.4000, 115.4000]], device='cuda:0')
return ret:
tensor(102.5385, device='cuda:0', grad_fn=<PermuteBackward>)
The next step is to enter correctly , Input size yes [4],target Of size yes [4]:
input = tensor([-0.3704, -0.2918, -0.6895, -0.6023], device='cuda:0',
grad_fn=<PermuteBackward>)
target = tensor([ 63.6000, 127.0000, 102.2000, 115.4000], device='cuda:0')
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
return expanded_input:
tensor([[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023],
[-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0',
grad_fn=<PermuteBackward>)
return ret:
tensor(102.5385, device='cuda:0', grad_fn=<PermuteBackward>)
after mean After averaging , Back to ret The value is the same , The only difference is expanded_input. This intermediate value is different , Whether it will lead to gradient change ? To test the idea , We output... In the code input The gradient value of .
for name, parms in model.named_parameters():
print('name:', name)
print('grad_requirs:', parms.requires_grad)
print('grad_value:', parms.grad)
The following is the wrong input , Input size yes [4,1],target Of size yes [4]:
===
name: module.linear1.bias
grad_requirs: True
grad_value: tensor([-0.1339, 0.0000, 0.0505, 0.0219, -0.1498, 0.0265, -0.0604, -0.0385,
0.0471, 0.0000, 0.0304, 0.0000, 0.0000, 0.0406, 0.0066, 0.0000,
-0.0259, -0.1544, 0.0000, -0.0208, 0.0050, 0.0000, 0.0625, -0.0474,
0.0000, 0.0858, -0.0116, 0.0777, 0.0000, -0.0828, 0.0000, -0.1265],
device='cuda:0')
===
name: module.linear2.weight
grad_requirs: True
grad_value: tensor([[-0.9879, -0.0000, -1.0088, -0.1680, -0.7312, -0.0066, -0.3093, -0.7478,
-0.3104, -0.0000, -0.1615, -0.0000, -0.0000, -0.3162, -0.1047, -0.0000,
-0.4030, -0.3385, -0.0000, -0.1738, -0.0831, -0.0000, -0.3490, -0.1129,
-0.0000, -0.8220, -0.0279, -0.3754, -0.0000, -0.3566, -0.0000, -0.5950]],
device='cuda:0')
===
name: module.linear2.bias
grad_requirs: True
grad_value: tensor([-1.], device='cuda:0')
===
The following is the correct input , Input size yes [4],target Of size yes [4] The resulting gradient :
===
name: module.linear1.bias
grad_requirs: True
grad_value: tensor([-0.1351, 0.0000, 0.0000, 0.0000, -0.0377, 0.0000, -0.0809, -0.0394,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0202, 0.0098, -0.0365,
-0.0263, -0.2063, -0.1533, -0.0626, 0.0050, 0.0000, 0.0000, -0.0950,
0.0000, 0.0000, -0.0348, 0.0000, 0.0000, -0.1108, -0.0402, -0.1693],
device='cuda:0')
===
name: module.linear2.weight
grad_requirs: True
grad_value: tensor([[-7.4419, 0.0000, 0.0000, 0.0000, -1.9245, 0.0000, -2.7927, -2.4551,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, -0.0309, -0.4843, -0.0211,
-1.7046, -7.7090, -0.1696, -0.9997, -0.0862, 0.0000, 0.0000, -2.0397,
0.0000, 0.0000, -0.3125, 0.0000, 0.0000, -3.9532, -0.0643, -6.5799]],
device='cuda:0')
===
name: module.linear2.bias
grad_requirs: True
grad_value: tensor([-1.], device='cuda:0')
===
Sure enough , The gradient values are different !!! Lessons learned : Each line of code should have a deep understanding of how it works , Don't take it for granted !
边栏推荐
- Can I open a stock account with a compass? Is it safe?
- smartBugs安装小问题总结
- 【LeetCode】two num·两数之和
- VectorDraw Developer Framework 10.10
- 【蒸馏】PointDistiller: Structured Knowledge DistillationTowards Efficient and Compact 3D Detection
- Sichuan Tuwei ca-is3105w fully integrated DC-DC converter
- Vscode official configuration synchronization scheme
- [Introduction aux uvm== > Episode 9] ~ modèle de registre, intégration du modèle de registre, méthode conventionnelle du modèle de registre, scénario d'application du modèle de registre
- Runtime——methods成员变量,cache成员变量
- 函数模板_类模板
猜你喜欢

函数模板_类模板

MySQL facet 01

Weimeisi new energy rushes to the scientific innovation board: the annual revenue is 1.7 billion, and the book value of accounts receivable is nearly 400million

Path planner based on time potential function in dynamic environment

Cocos learning diary 3 - API acquisition nodes and components

为什么要“除夕”,原来是内存爆了!

Introduction to Sichuan Tuwei ca-is3082w isolated rs-485/rs-422 transceiver

Sichuan earth microelectronics 8-channel isolated digital input receiver

Sichuan Tuwei ca-is3105w fully integrated DC-DC converter
![[batch dos-cmd command - summary and summary] - CMD extended command and function (CMD /e:on, CMD /e:off)](/img/2b/4495a6cd41a2dd4e7a20ee60b398c9.png)
[batch dos-cmd command - summary and summary] - CMD extended command and function (CMD /e:on, CMD /e:off)
随机推荐
[batch dos-cmd command - summary and summary] - CMD extended command and function (CMD /e:on, CMD /e:off)
【批处理DOS-CMD命令-汇总和小结】-CMD窗口的设置与操作命令(cd、title、mode、color、pause、chcp、exit)
From perceptron to transformer, a brief history of deep learning
The principle of Zener diode, what is its function?
基于地面点稀少的LiDAR点云的茂密森林蓄积量估算
keepalived監控進程,自動重啟服務進程
MySQL facet 01
栅格地图(occupancy grid map)构建
lebel只想前面有星号,但是不想校验
[QT] shortcut key
Weimeisi new energy rushes to the scientific innovation board: the annual revenue is 1.7 billion, and the book value of accounts receivable is nearly 400million
【批处理DOS-CMD命令-汇总和小结】-外部命令-cmd下载命令、抓包命令(wget)
Application scheme | application of Sichuan earth microelectronics ca-is398x in PLC field
NSIS 静默安装vs2013运行时
Tuwei Digital Isolator and interface chip can perfectly replace imported brands Ti and ADI
图扑软件数字孪生 3D 风电场,智慧风电之海上风电
Kube scheduler source code analysis (1) - initialization and startup analysis
正版photoshop2022购买体验经历分享
【蒸馏】PointDistiller: Structured Knowledge DistillationTowards Efficient and Compact 3D Detection
Cglib dynamic proxy