当前位置：网站首页>[comparative learning] understanding the behavior of contractual loss (CVPR '21)

[comparative learning] understanding the behavior of contractual loss (CVPR '21)

2022-07-25 12:00:00 【chad_ lee】

Understanding the Behaviour of Contrastive Loss (CVPR’21)

Contrastive Loss Temperature coefficient in $\tau$ Is a key parameter , Most papers put $\tau$ Set to a small number , This article starts with the analysis of temperature parameters $\tau$ set out , Analysis shows that ：

Contrast loss can actually automatically mine hard negative samples , Therefore, we can learn high-quality self-monitoring representations . In particular , For negative samples that have been far away , There is no need to keep it away ; Mainly focus on negative samples that are not far away （ Hard negative sample ）, Thus, the representation space is more uniform （ It's similar to the red circle chart below ）.
temperature coefficient $\tau$ The degree of mining negative samples can be controlled , $\tau$ The smaller the sample, the more attention is paid to the difficult negative sample .

Hardness-Awareness

The widely used comparison loss function is InfoNCE：
$\mathcal{L}\left(x_{i}\right)=-\log \left[\frac{\exp \left(s_{i, i} / \tau\right)}{\sum_{k \neq i} \exp \left(s_{i, k} / \tau\right)+\exp \left(s_{i, i} / \tau\right)}\right]$
This loss function requires the second i Samples and it's another amplified （ just ） Similarity between samples $s_{i,i}$ As big as possible , And with other examples （ Negative sample ） The similarity between $s_{i,k}$ As small as possible . But there are many loss functions that satisfy this condition , For example, the simplest function $\mathcal{L}_{\text {simple }}$ ：
$\mathcal{L}_{\text {simple }}\left(x_{i}\right)=-s_{i, i}+\lambda \sum_{i \neq j} s_{i, j}$
But the training effect of these two loss functions is much worse ：

Data sets	Contrastive Loss	Simple Loss
CIFAR-10	79.75	74
CIFAR-100	51.82	49
ImageNet-100	71.53	74.31
SVHN	92.55	94.99

This is because Simple Loss The same weight penalty is given to all negative sample similarity ： $\frac{\partial L_{\text {simple }}}{\partial s_{i, k}}=\lambda$ , That is, the gradient of the similarity of the loss function to all negative samples is equal . But in Contrastive Loss in , It will automatically give higher penalties to negative samples with higher similarity ：
$\text { The gradient of the positive sample : } \frac{\partial \mathcal{L}\left(x_{i}\right)}{\partial s_{i, i}}=-\frac{1}{\tau} \sum_{k \neq i} P_{i, k} \\ \text { The gradient of negative samples : } \frac{\partial \mathcal{L}\left(x_{i}\right)}{\partial s_{i, j}}=\frac{1}{\tau} P_{i, j} \propto s_{i, j}$
among $P_{i, j}=\frac{\exp \left(s_{i, j /} \tau\right)}{\sum_{k \neq i} \exp \left(s_{i, k} / \tau\right)+\exp \left(s_{i, i} / \tau\right)}$ , For all negative samples , $P_{i, j}$ The denominator of is the same , therefore $s_{i, j}$ The bigger it is , The gradient term of negative samples is also larger , This gives the negative sample a greater gradient away from the sample .（ It can be understood as focal loss, The harder it is, the greater the gradient ）. Thus, all samples are encouraged to be evenly distributed on a hypersphere .

To verify the truth Contrastive Loss It's really because we can mine the characteristics of difficult negative samples , The article shows that some additional difficult samples are selected for Simple Loss On （ Select for each sample 4096 A hard negative sample ）, Improved performance ：

Data sets	Contrastive Loss	Simple Loss + Hard
CIFAR-10	79.75	84.84
CIFAR-100	51.82	55.71
ImageNet-100	71.53	74.31
SVHN	92.55	94.99

temperature coefficient $\tau$ Degree of control

temperature coefficient $\tau$ The smaller it is , The loss function pays more attention to hard negative samples , Specially ：

When $\tau$ Tend to be 0 when ,Contrastive Loss Degenerate into focusing only on the hardest samples ：
$\lim _{\tau \rightarrow 0^{+}} \frac{1}{\tau} \max \left[s_{\max }-s_{i, i}, 0\right]$
This means that One by one Push each negative sample to the same distance from yourself ：

Insert picture description here

When $\tau$ Approaching infinity ,Contrastive Loss Almost degenerate into Simple Loss, The weight is the same for all negative samples .

So the temperature coefficient $\tau$ The smaller it is , The more uniform the distribution of sample characteristics , But this is not a good thing , Because the potential positive sample （False Negative） Also pushed away ：

Insert picture description here

原网站

版权声明
本文为[chad_ lee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251110592022.html

当前位置：网站首页>[comparative learning] understanding the behavior of contractual loss (CVPR '21)

[comparative learning] understanding the behavior of contractual loss (CVPR '21)

Understanding the Behaviour of Contrastive Loss (CVPR’21)

Hardness-Awareness

temperature coefficient $\tau$ Degree of control

边栏推荐

猜你喜欢

随机推荐

当前位置：网站首页>[comparative learning] understanding the behavior of contractual loss (CVPR '21)

[comparative learning] understanding the behavior of contractual loss (CVPR '21)

Understanding the Behaviour of Contrastive Loss (CVPR’21)

Hardness-Awareness

temperature coefficient τ \tau τ Degree of control

边栏推荐

猜你喜欢

随机推荐

temperature coefficient $\tau$ Degree of control