当前位置:网站首页>[out of distribution detection] deep analog detection with outlier exposure ICLR '19
[out of distribution detection] deep analog detection with outlier exposure ICLR '19
2022-06-22 06:55:00 【chad_ lee】
Training anomaly detectors using anomaly data sets , This method is called abnormal exposure (Outlier Exposure,OE). This enables the anomaly detector to generalize and detect unseen anomalies . In a large number of experiments on naturallanguageprocessing and small-scale and large-scale visual tasks , The article found that Outlier Exposure Can significantly improve detection performance .
Outlier Exposure
The so-called abnormal exposure , Is to introduce abnormal data to the exception detector , Let the model get inspiration from the existing abnormal data , Thus, we can generalize the exceptions we have never seen .
This article has only one formula , Model import OE After the optimization goal :
E ( x , y ) ∼ D in [ L ( f ( x ) , y ) + λ E x ′ ∼ D out OE [ L O E ( f ( x ′ ) , f ( x ) , y ) ] ] \mathbb{E}_{(x, y) \sim \mathcal{D}_{\text {in }}}\left[\mathcal{L}(f(x), y)+\lambda \mathbb{E}_{x^{\prime} \sim \mathcal{D}_{\text {out }}^{\text {OE }}}\left[\mathcal{L}_{\mathrm{OE}}\left(f\left(x^{\prime}\right), f(x), y\right)\right]\right] E(x,y)∼Din [L(f(x),y)+λEx′∼Dout OE [LOE(f(x′),f(x),y)]]
The first of these is L \mathcal{L} L The optimization goal of the original model on the original task , The second item L O E \mathcal{L}_{OE} LOE yes OE Optimization objectives , Depending on the task , The following experiments are defined one by one .
Data sets
IN-DISTRIBUTION DATASETS:SVHN、CIFAR、Tiny ImageNet、Places365、20 Newsgroups、TREC、SST. One of them is ID When dataset , In addition, similar data sets are used as ODD.
OUTLIER EXPOSURE DATASETS:80 Million Tiny Images、ImageNet-22K、WikiText-2. And are excluded ID Data set overlaps , Guarantee D out OE \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE and D out test \mathcal{D}_{\text {out }}^{\text {test }} Dout test orthogonal .
Task a : Multi category tasks
For one k Classification task , Input x ∈ X x \in \mathcal{X} x∈X, Classifier output y ∈ Y = { 1 , 2 , … , k } y \in \mathcal{Y}=\{1,2, \ldots, k\} y∈Y={ 1,2,…,k}. classifier f : X → R k f: \mathcal{X} \rightarrow \mathbb{R}^{k} f:X→Rk, And for any x x x, 1 ⊤ f ( x ) = 1 1^{\top} f(x)=1 1⊤f(x)=1 and f ( x ) ⪰ 0 f(x) \succeq 0 f(x)⪰0.
Maximum Softmax Probability (MSP)
baseline. Enter a x x x, Output OOD score max c f c ( x ) \max _{c} f_{c}(x) maxcfc(x).
fine-tuning The goal is :
E ( x , y ) ∼ D in [ − log f y ( x ) ] + λ E x ∼ D ot oE [ H ( U ; f ( x ) ) ] \mathbb{E}_{(x, y) \sim \mathcal{D}_{\text {in }}}\left[-\log f_{y}(x)\right]+\lambda \mathbb{E}_{x \sim \mathcal{D}_{\text {ot }}^{\text {oE }}}[H(\mathcal{U} ; f(x))] E(x,y)∼Din [−logfy(x)]+λEx∼Dot oE [H(U;f(x))]
among H H H It's cross entropy , U \mathcal{U} U yes k Uniform distribution of classes .
In fact, from the beginning of the training model, we added OE Regular items work better , The reason for choosing fine-tuning It's about time and GPU Memory .
added OE Of MSP Method in CV and NLP The effect is improved :


Confidence Branch
《Learning confidence for out-of-distribution detection in neural networks》2018 The method proposed in , Study confidence, For a sample model, output a OOD fraction b : X → [ 0 , 1 ] b: \mathcal{X} \rightarrow[0,1] b:X→[0,1]. So here we use OE Is to add... To the optimization objective of the original model :
0.5 E x ∼ D out OE [ log b ( x ) ] 0.5 \mathbb{E}_{x \sim \mathcal{D}_{\text {out }}^{\text {OE }}}[\log b(x)] 0.5Ex∼Dout OE [logb(x)]
effect :

Synthetic Outliers
The author wants to use OE To solve the confrontation sample , So the author tried to disturb the picture with some noise , Then use these noisy images as OE Data sets . However, the author found that although this classifier can remember these noise features , But I can't recognize the new OOD sample . Then the author directly uses someone else's code 《Training confidence-calibrated classifiers for detecting out-of-distribution samples.》 That is to say MSP On the basis of GAN fine-tuning, That is for GAN The generated samples give a high OOD fraction . Then the author uses OE fine-tuning, The effect is further enhanced . I didn't understand the details of how to implement this part , This part is short , There is no appendix .

Task 2 : Density estimation
The density estimator learns the data distribution D in \mathcal{D}_{\text {in }} Din Probability density function on , Abnormal samples should have a lower probability density , Because they seldom appear in D in \mathcal{D}_{\text {in }} Din .
Pixel CNN++
A sample x x x Of OOD score use bits per pixel(BPP) Expressed as nll(x)/num_pixels, among nnl yes negative log-likelihood. here OEis implemented with a margin loss over the log-likelihood difference between in-distribution and anomalous examples. So from D i n \mathcal{D}_{in} Din The sample of x i n x_{in} xin And from the D out OE \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE The outliers of x o u t x_{out} xout Of loss yes :
max { 0 , num_pixels + nll ( x in ) − nll ( x out ) } \max \left\{0, \text { num\_pixels }+\operatorname{nll}\left(x_{\text {in }}\right)-\operatorname{nll}\left(x_{\text {out }}\right)\right\} max{ 0, num_pixels +nll(xin )−nll(xout )}
Language Modeling
use QRNN As baseline OOD detectors.OOD score use bits per character (BPC) or bits per word (BPW), Defined as nll(x)/sequence_length. among nnl( x x x) It's a sequence x x x Of negative log-likelihood.OE adopt adding the cross entropy to the uniform distribution on tokens from sequences in D out OE \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE as an additional loss term Realization .

Summary
Proposed by the author OE There are many advantages , The conclusions given by the author are : Extensibility ( Many tasks can be used )、 You can choose flexibly D out OE \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE 、OE It can improve the accuracy of the model itself .
But I think what the author said “ D out OE \mathcal{D}_{\text {out }}^{\text {OE }} Dout OE The model can be inspired , So as to generalize and recognize what has not been seen D out test \mathcal{D}_{\text {out }}^{\text {test }} Dout test ” Some are too mysterious , After all, the distribution of such things can not be accurately divided , It's kind of like “Transfer Unlearning”.
I think the advantage of this article is a lot of 、 A full and accurate experiment , Don't tell more stories , Each point is explained experimentally .
边栏推荐
- C skill tree evaluation - customer first, making excellent products
- Flink core features and principles
- Laravel excel 3.1 column width setting does not work
- Literature record (part106) -- graph auto-encoder via neighborhood Wasserstein reconstruction
- 生成字符串方式
- Detailed tutorial on connecting MySQL with tableau
- 实训渗透靶场02|3星vh-lll靶机|vulnhub靶场Node1
- JDBC查询结果集,结果集转化成表
- Successfully solved raise keyerror (F "none of [{key}] are in the [{axis\u name}]") keyerror: "none of [index (['age.in.y
- CNN模型合集 | Resnet变种-WideResnet解读
猜你喜欢

Xh_CMS渗透测试文档

How to learn 32-bit MCU
![[openairinterface5g] rrcsetuprequest for RRC NR resolution](/img/de/34e71154941f977546362f10a19929.jpg)
[openairinterface5g] rrcsetuprequest for RRC NR resolution
![Leetcode: interview question 08.12 Eight queens [DFS + backtrack]](/img/d0/e28af1a457f433b35972fd807b8ad7.png)
Leetcode: interview question 08.12 Eight queens [DFS + backtrack]

Which is the best agency mode or decoration mode

C语言——深入理解数组
![[rust daily] January 23, 2022 webapi benchmarking](/img/f7/64f389ff2b8fb481820e577b8531b3.png)
[rust daily] January 23, 2022 webapi benchmarking

Wildfire stm32f407zgt6 learning notes beginner level chapter basic knowledge points

MySQL ifnull processing n/a

OpenGL - Draw Triangle
随机推荐
[tp6] using the workman websocket
JS中控制对象的访问
KV260的PMOD接口介绍
Introduction to 51 Single Chip Microcomputer -- the use of Keil uvision4
PgSQL batch insert
《数据安全实践指南》- 数据采集安全管理
[5g NR] ng interface
Introduction notes to quantum computing (continuously updated)
Anaconda introduction, installation and use nanny level tutorial
5G-GUTI详解
Shengxin literature learning (Part1) -- precision: a approach to transfer predictors of drug response from pre-clinical ...
OpenGL - Textures
5G终端标识SUPI,SUCI及IMSI解析
[M32] simple interpretation of MCU code, RO data, RW data and Zi data
仙人掌之歌——进军To C直播(3)
Flink core features and principles
JS中如何阻止事件的传播
Leetcode: interview question 08.12 Eight queens [DFS + backtrack]
代码的巨大进步
Single cell literature learning (Part3) -- dstg: deconvoluting spatial transcription data through graph based AI