当前位置:网站首页>[out of distribution detection] deep analog detection with outlier exposure ICLR '19

[out of distribution detection] deep analog detection with outlier exposure ICLR '19

2022-06-22 06:55:00 chad_ lee

Training anomaly detectors using anomaly data sets , This method is called abnormal exposure (Outlier Exposure,OE). This enables the anomaly detector to generalize and detect unseen anomalies . In a large number of experiments on naturallanguageprocessing and small-scale and large-scale visual tasks , The article found that Outlier Exposure Can significantly improve detection performance .

Outlier Exposure

The so-called abnormal exposure , Is to introduce abnormal data to the exception detector , Let the model get inspiration from the existing abnormal data , Thus, we can generalize the exceptions we have never seen .

This article has only one formula , Model import OE After the optimization goal :
E ( x , y ) ∼ D in  [ L ( f ( x ) , y ) + λ E x ′ ∼ D out  OE  [ L O E ( f ( x ′ ) , f ( x ) , y ) ] ] \mathbb{E}_{(x, y) \sim \mathcal{D}_{\text {in }}}\left[\mathcal{L}(f(x), y)+\lambda \mathbb{E}_{x^{\prime} \sim \mathcal{D}_{\text {out }}^{\text {OE }}}\left[\mathcal{L}_{\mathrm{OE}}\left(f\left(x^{\prime}\right), f(x), y\right)\right]\right] E(x,y)Din [L(f(x),y)+λExDout OE [LOE(f(x),f(x),y)]]
The first of these is L \mathcal{L} L The optimization goal of the original model on the original task , The second item L O E \mathcal{L}_{OE} LOE yes OE Optimization objectives , Depending on the task , The following experiments are defined one by one .

Data sets

IN-DISTRIBUTION DATASETS:SVHN、CIFAR、Tiny ImageNet、Places365、20 Newsgroups、TREC、SST. One of them is ID When dataset , In addition, similar data sets are used as ODD.

OUTLIER EXPOSURE DATASETS:80 Million Tiny Images、ImageNet-22K、WikiText-2. And are excluded ID Data set overlaps , Guarantee D out  OE  \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE  and D out  test  \mathcal{D}_{\text {out }}^{\text {test }} Dout test  orthogonal .

Task a : Multi category tasks

For one k Classification task , Input x ∈ X x \in \mathcal{X} xX, Classifier output y ∈ Y = { 1 , 2 , … , k } y \in \mathcal{Y}=\{1,2, \ldots, k\} yY={ 1,2,,k}. classifier f : X → R k f: \mathcal{X} \rightarrow \mathbb{R}^{k} f:XRk, And for any x x x, 1 ⊤ f ( x ) = 1 1^{\top} f(x)=1 1f(x)=1 and f ( x ) ⪰ 0 f(x) \succeq 0 f(x)0.

Maximum Softmax Probability (MSP)

baseline. Enter a x x x, Output OOD score max ⁡ c f c ( x ) \max _{c} f_{c}(x) maxcfc(x).

fine-tuning The goal is :
E ( x , y ) ∼ D in  [ − log ⁡ f y ( x ) ] + λ E x ∼ D ot  oE  [ H ( U ; f ( x ) ) ] \mathbb{E}_{(x, y) \sim \mathcal{D}_{\text {in }}}\left[-\log f_{y}(x)\right]+\lambda \mathbb{E}_{x \sim \mathcal{D}_{\text {ot }}^{\text {oE }}}[H(\mathcal{U} ; f(x))] E(x,y)Din [logfy(x)]+λExDot oE [H(U;f(x))]
among H H H It's cross entropy , U \mathcal{U} U yes k Uniform distribution of classes .

In fact, from the beginning of the training model, we added OE Regular items work better , The reason for choosing fine-tuning It's about time and GPU Memory .

added OE Of MSP Method in CV and NLP The effect is improved :

image-20210322162405966image-20210322162426577

Confidence Branch

《Learning confidence for out-of-distribution detection in neural networks》2018 The method proposed in , Study confidence, For a sample model, output a OOD fraction b : X → [ 0 , 1 ] b: \mathcal{X} \rightarrow[0,1] b:X[0,1]. So here we use OE Is to add... To the optimization objective of the original model :
0.5 E x ∼ D out  OE  [ log ⁡ b ( x ) ] 0.5 \mathbb{E}_{x \sim \mathcal{D}_{\text {out }}^{\text {OE }}}[\log b(x)] 0.5ExDout OE [logb(x)]
effect :

image-20210322183333086

Synthetic Outliers

The author wants to use OE To solve the confrontation sample , So the author tried to disturb the picture with some noise , Then use these noisy images as OE Data sets . However, the author found that although this classifier can remember these noise features , But I can't recognize the new OOD sample . Then the author directly uses someone else's code 《Training confidence-calibrated classifiers for detecting out-of-distribution samples.》 That is to say MSP On the basis of GAN fine-tuning, That is for GAN The generated samples give a high OOD fraction . Then the author uses OE fine-tuning, The effect is further enhanced . I didn't understand the details of how to implement this part , This part is short , There is no appendix .

image-20210322192731475

Task 2 : Density estimation

The density estimator learns the data distribution D in  \mathcal{D}_{\text {in }} Din  Probability density function on , Abnormal samples should have a lower probability density , Because they seldom appear in D in  \mathcal{D}_{\text {in }} Din .

Pixel CNN++

A sample x x x Of OOD score use bits per pixel(BPP) Expressed as nll(x)/num_pixels, among nnl yes negative log-likelihood. here OEis implemented with a margin loss over the log-likelihood difference between in-distribution and anomalous examples. So from D i n \mathcal{D}_{in} Din The sample of x i n x_{in} xin And from the D out  OE  \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE  The outliers of x o u t x_{out} xout Of loss yes :
max ⁡ { 0 ,  num_pixels  + nll ⁡ ( x in  ) − nll ⁡ ( x out  ) } \max \left\{0, \text { num\_pixels }+\operatorname{nll}\left(x_{\text {in }}\right)-\operatorname{nll}\left(x_{\text {out }}\right)\right\} max{ 0, num_pixels +nll(xin )nll(xout )}
image-20210322200054610

Language Modeling

use QRNN As baseline OOD detectors.OOD score use bits per character (BPC) or bits per word (BPW), Defined as nll(x)/sequence_length. among nnl( x x x) It's a sequence x x x Of negative log-likelihood.OE adopt adding the cross entropy to the uniform distribution on tokens from sequences in D out  OE  \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE  as an additional loss term Realization .

image-20210322201757004

Summary

Proposed by the author OE There are many advantages , The conclusions given by the author are : Extensibility ( Many tasks can be used )、 You can choose flexibly D out  OE  \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE 、OE It can improve the accuracy of the model itself .

But I think what the author said “ D out  OE  \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE  The model can be inspired , So as to generalize and recognize what has not been seen D out  test  \mathcal{D}_{\text {out }}^{\text {test }} Dout test ” Some are too mysterious , After all, the distribution of such things can not be accurately divided , It's kind of like “Transfer Unlearning”.

I think the advantage of this article is a lot of 、 A full and accurate experiment , Don't tell more stories , Each point is explained experimentally .

原网站

版权声明
本文为[chad_ lee]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202220543470427.html