当前位置:网站首页>Label smoothing
Label smoothing
2022-07-24 04:23:00 【Billie studies hard】
Catalog
1. What problems does label smoothing mainly solve ?
2. How does label smoothing work ?
Label smoothing (label smoothing) come from GoogleNet v3
About one-hot For detailed knowledge of coding, see :One-hot code
1. What problems does label smoothing mainly solve ?
Conventional one-hot code It will bring problem : The generalization ability of the model cannot be guaranteed , Making the network overconfident can lead to over fitting .
Total probability and 0 probability Encourage between the category and other categories The gap should be widened as much as possible , And from the boundedness of the gradient , It's hard to adapt. Meeting Cause the model to believe too much in the category of prediction . and Label smoothing Can alleviate this problem .
2. How does label smoothing work ?
Label smoothing is hold one-hot Medium probability is 1 The item of is attenuated , Avoid overconfidence , The decaying part of confidence Be divided equally into each category .
for example :
One 4 Classification task ,label = (0,1,0,0)
labeling smoothing = (
,1-0.001+
,
,
)=(0.00025,0.99925,0.00025,0.00025)
among , The probability adds up to 1.
3. Label smoothing formula
Cross entropy (Cross Entropy):
among ,q For tag value ,p To predict the result ,k For the category . namely q by one-hot Coding results .
labeling smothing: take q Smooth the label to q', Let the model output p Distribution to approximate q'.
, among u(k) Is a probability distribution , Here the Uniform distribution (
), Then we get 
among ,
Is the original distribution q, ϵ ∈(0,1) It's a super parameter .
It can be seen from the above formula that , In this way label Yes ϵ Probability comes from uniform distribution , 1−ϵ The probability comes from the original distribution . This is equivalent to In the original label Add noise to the , Let the predicted value of the model Don't focus too much on categories with high probability , Put some probabilities in the lower probability category .
Therefore, the cross entropy loss function after label smoothing is :
How did you get this formula ?
take q'(k|x) Bring in the cross entropy loss function :

![=-\sum_{k=1}^{k}log(p_k)[(1-\varepsilon )\delta _{k,y}+\frac{\varepsilon }{k}]](http://img.inotgo.com/imagesLocal/202207/23/202207221953347716_11.gif)
![=-\sum_{k=1}^{k}log(p_k)(1-\varepsilon )\delta _{k,y}+[-\sum_{k=1}^{k}log(p_k)\frac{\varepsilon }{k}]](http://img.inotgo.com/imagesLocal/202207/23/202207221953347716_6.gif)
![=(1-\varepsilon )*[-\sum_{k=1}^{k}log(p_k)\delta _{k,y}]+\varepsilon *[-\sum_{k=1}^{k}log(p_k)\frac{1}{k}]](http://img.inotgo.com/imagesLocal/202207/23/202207221953347716_5.gif)

In this way, we get the label smoothing formula .
4. Code implementation
class LabelSmoothingCrossEntropy(nn.Module):
def __init__(self, eps=0.1, reduction='mean', ignore_index=-100):
super(LabelSmoothingCrossEntropy, self).__init__()
self.eps = eps
self.reduction = reduction
self.ignore_index = ignore_index
def forward(self, output, target):
c = output.size()[-1]
log_pred = torch.log_softmax(output, dim=-1)
if self.reduction == 'sum':
loss = -log_pred.sum()
else:
loss = -log_pred.sum(dim=-1)
if self.reduction == 'mean':
loss = loss.mean()
return loss * self.eps / c + (1 - self.eps) * torch.nn.functional.nll_loss(log_pred, target,
reduction=self.reduction,
ignore_index=self.ignore_index)
边栏推荐
- Parallel technology of Oracle
- J9 number theory: what is Web3.0? What are the characteristics of Web3.0?
- 致-.-- -..- -
- Leetcode (Sword finger offer) - 11. Minimum number of rotation array
- PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c
- Leetcode 20 valid parentheses, 33 search rotation sort array, 88 merge two ordered arrays (nums1 length is m+n), 160 intersecting linked list, 54 spiral matrix, 415 character addition (cannot be direc
- Graduation thesis on enterprise production line improvement [Flexsim simulation example]
- [untitled]
- adobe PR2022 没有开放式字幕怎么办?
- Upgrade POI TL version 1.12.0 and resolve the dependency conflict between the previous version of POI (4.1.2) and easyexcel
猜你喜欢

"Wei Lai Cup" 2022 Niuke summer multi school training camp 1 (summary of some topics)

【C语言】程序环境和预处理操作

Particle Designer:粒子效果制作器,生成plist文件并在工程中正常使用

D2DEngine食用教程(3)———将渲染目标导出为图像文件

What are the 10 live demos showing? It's worth watching again whether you've seen it or not

短视频本地生活版块,有哪些新的机会存在?

Avoid mistakes, common appium related problems and Solutions

What if Adobe pr2022 doesn't have open subtitles?

IPhone binding 163 mailbox solution

Design and implementation of data analysis platform for intelligent commerce
随机推荐
PMIX ERROR: ERROR in file gds_ ds12_ lock_ pthread.c
What are the 10 live demos showing? It's worth watching again whether you've seen it or not
Oracle的并行技术
Particle Designer:粒子效果制作器,生成plist文件并在工程中正常使用
To -.---
Ros2 common command line tools organize ros2cli
How to change the direction of this gracefully
PostgreSQL source code learning (32) -- checkpoint ④ - core function createcheckpoint
直播课堂系统04-创建service模块
How safe is Volvo XC90? Come and have a look
NFT除了买卖还能质押?
ECB interface is also mdsemodet in essence
[development technology] spingboot database and Persistence technology, JPA, mongodb, redis
Ambire wallet opens twitter spaces series
一次 svchost.exe 进程占用大量网络带宽的排查
Live video | 37 how to use starrocks to realize user portrait analysis in mobile games
May be a little useful cold knowledge
智能合约:发布一种ERC20代币
Four characteristics of nb-iot
嵌入式系统移植【6】——uboot源码结构