当前位置:网站首页>Pit encountered by pytorch: why can't l1loss decrease during model training?

Pit encountered by pytorch: why can't l1loss decrease during model training?

2022-06-25 07:39:00 The struggle of a rookie

Recently in use L1loss Do a regression model training , During model training loss It's extremely unstable , And the training effect is very poor , Finally, I found the reason !

The original code is as follows :

criterion = nn.L1Loss()
def train():
    print('Epoch {}:'.format(epoch + 1))
    model.train()
    # switch to train mode
    for i, sample_batched in enumerate(train_dataloader):
        input, target = sample_batched['geno'], sample_batched['pheno']
        # compute output
        output = model(input.float().cuda())
        loss = criterion(output, target.float().cuda())
        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The problem with the above code lies in :

loss = criterion(output, target.float().cuda())

I input batchsize yes 4, therefore output Of size yes [4,1], That is, a two-dimensional data ;target Of size yes [4].loss The output is a correct value . That's why I didn't find the problem ! Let's see pytorch garage l1_loss Code for :

def l1_loss(input, target, size_average=None, reduce=None, reduction='mean'):
    # type: (Tensor, Tensor, Optional[bool], Optional[bool], str) -> Tensor
    r"""l1_loss(input, target, size_average=None, reduce=None, reduction='mean') -> Tensor

    Function that takes the mean element-wise absolute value difference.

    See :class:`~torch.nn.L1Loss` for details.
    """
    if not torch.jit.is_scripting():
        tens_ops = (input, target)
        if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
            return handle_torch_function(
                l1_loss, tens_ops, input, target, size_average=size_average, reduce=reduce,
                reduction=reduction)
    if not (target.size() == input.size()):
        warnings.warn("Using a target size ({}) that is different to the input size ({}). "
                      "This will likely lead to incorrect results due to broadcasting. "
                      "Please ensure they have the same size.".format(target.size(), input.size()),
                      stacklevel=2)
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)
    if target.requires_grad:
        ret = torch.abs(input - target)
        if reduction != 'none':
            ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
    else:
        expanded_input, expanded_target = torch.broadcast_tensors(input, target)
        ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
    return ret

In code warning, requirement input and target Of size It has to be consistent , Otherwise, there will be wrong results . I put... In my own code warning to ignore 了 , So this warning I haven't seen ! Here's a reminder , Don't be casual ignore warning, And take a good look warning, Don't just watch error....

I changed the code to the following , There is no problem :

loss = criterion(output.squeeze(), target.float().cuda())

Now that the problem is solved , You need to know why size Mismatches can lead to model errors , Otherwise, I have been looking for it for so long bug I am not blind in vain = =

Let's try wrong typing first , Input size yes [4,1],target Of size yes [4]

input = tensor([[-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0',
       grad_fn=<PermuteBackward>)
target = tensor([ 63.6000, 127.0000, 102.2000, 115.4000], device='cuda:0')

expanded_input, expanded_target = torch.broadcast_tensors(input, target)

ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))

  return expanded_input:

tensor([[-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0',
       grad_fn=<PermuteBackward>)

return  expanded_target:

tensor([[ 63.6000,  63.6000,  63.6000,  63.6000],
        [127.0000, 127.0000, 127.0000, 127.0000],
        [102.2000, 102.2000, 102.2000, 102.2000],
        [115.4000, 115.4000, 115.4000, 115.4000]], device='cuda:0') 

return ret:

tensor(102.5385, device='cuda:0', grad_fn=<PermuteBackward>)

The next step is to enter correctly , Input size yes [4],target Of size yes [4]: 

 input = tensor([-0.3704, -0.2918, -0.6895, -0.6023], device='cuda:0',
       grad_fn=<PermuteBackward>)
target = tensor([ 63.6000, 127.0000, 102.2000, 115.4000], device='cuda:0')

expanded_input, expanded_target = torch.broadcast_tensors(input, target)

ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))

  return expanded_input:

 tensor([[-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023],
        [-0.3704, -0.2918, -0.6895, -0.6023]], device='cuda:0',
       grad_fn=<PermuteBackward>)

return ret:

tensor(102.5385, device='cuda:0', grad_fn=<PermuteBackward>)

  after mean After averaging , Back to ret The value is the same , The only difference is expanded_input. This intermediate value is different , Whether it will lead to gradient change ? To test the idea , We output... In the code input The gradient value of .

for name, parms in model.named_parameters():
    print('name:', name)
    print('grad_requirs:', parms.requires_grad)
    print('grad_value:', parms.grad)

The following is the wrong input , Input size yes [4,1],target Of size yes [4]: 

 ===
name: module.linear1.bias
grad_requirs: True
grad_value: tensor([-0.1339,  0.0000,  0.0505,  0.0219, -0.1498,  0.0265, -0.0604, -0.0385,
         0.0471,  0.0000,  0.0304,  0.0000,  0.0000,  0.0406,  0.0066,  0.0000,
        -0.0259, -0.1544,  0.0000, -0.0208,  0.0050,  0.0000,  0.0625, -0.0474,
         0.0000,  0.0858, -0.0116,  0.0777,  0.0000, -0.0828,  0.0000, -0.1265],
       device='cuda:0')
===
name: module.linear2.weight
grad_requirs: True
grad_value: tensor([[-0.9879, -0.0000, -1.0088, -0.1680, -0.7312, -0.0066, -0.3093, -0.7478,
         -0.3104, -0.0000, -0.1615, -0.0000, -0.0000, -0.3162, -0.1047, -0.0000,
         -0.4030, -0.3385, -0.0000, -0.1738, -0.0831, -0.0000, -0.3490, -0.1129,
         -0.0000, -0.8220, -0.0279, -0.3754, -0.0000, -0.3566, -0.0000, -0.5950]],
       device='cuda:0')
===
name: module.linear2.bias
grad_requirs: True
grad_value: tensor([-1.], device='cuda:0')
===

The following is the correct input , Input size yes [4],target Of size yes [4] The resulting gradient : 

 ===
name: module.linear1.bias
grad_requirs: True
grad_value: tensor([-0.1351,  0.0000,  0.0000,  0.0000, -0.0377,  0.0000, -0.0809, -0.0394,
         0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0202,  0.0098, -0.0365,
        -0.0263, -0.2063, -0.1533, -0.0626,  0.0050,  0.0000,  0.0000, -0.0950,
         0.0000,  0.0000, -0.0348,  0.0000,  0.0000, -0.1108, -0.0402, -0.1693],
       device='cuda:0')
===
name: module.linear2.weight
grad_requirs: True
grad_value: tensor([[-7.4419,  0.0000,  0.0000,  0.0000, -1.9245,  0.0000, -2.7927, -2.4551,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000, -0.0309, -0.4843, -0.0211,
         -1.7046, -7.7090, -0.1696, -0.9997, -0.0862,  0.0000,  0.0000, -2.0397,
          0.0000,  0.0000, -0.3125,  0.0000,  0.0000, -3.9532, -0.0643, -6.5799]],
       device='cuda:0')
===
name: module.linear2.bias
grad_requirs: True
grad_value: tensor([-1.], device='cuda:0')
===

Sure enough , The gradient values are different !!! Lessons learned : Each line of code should have a deep understanding of how it works , Don't take it for granted !

原网站

版权声明
本文为[The struggle of a rookie]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/176/202206250533480121.html