当前位置:网站首页>[outside distribution detection] your classifier is secret an energy based model and you head treat it like one ICLR '20
[outside distribution detection] your classifier is secret an energy based model and you head treat it like one ICLR '20
2022-06-22 06:55:00 【chad_ lee】
https://arxiv.org/pdf/1912.03263v3.pdf
The commonly used classifier models are modeling p θ ( y ∣ x ) p_{\theta}(y \mid \mathbf{x}) pθ(y∣x), This article explains the classification model from the perspective of energy , Then a hybrid model of generation model and classification model is obtained . The model can model at the same time p θ ( y ∣ x ) p_{\theta}(y \mid \mathbf{x}) pθ(y∣x) and p θ ( x ) p_{\theta}(\mathbf{x}) pθ(x), Thus, the classification accuracy and sample generation quality are improved .
This article is also used as OOD Tested baseline.
Joint Energy-based Model(JEM)
First come overview Let's look at the model structure :

A neural network classification model is input into Softmax The value of a function is called f θ ( x ) f_{\theta}(x) fθ(x), The traditional classifier model uses f θ ( x ) f_{\theta}(x) fθ(x) Input to softmax Estimate in function p ( y ∣ x ) p(y \mid \mathbf{x}) p(y∣x), This article also uses f θ ( x ) f_{\theta}(x) fθ(x) To estimate p ( x , y ) p( \mathbf{x},y) p(x,y) and p ( x ) p(\mathbf{x}) p(x).
The method of this paper
EBM
Energy-based model:
p θ ( x ) = exp ( − E θ ( x ) ) Z ( θ ) (1) p_{\theta}(\mathrm{x})=\frac{\exp \left(-E_{\theta}(\mathrm{x})\right)}{Z(\theta)} \tag{1} pθ(x)=Z(θ)exp(−Eθ(x))(1)
among E θ ( x ) : R D → R E_{\theta}(\mathrm{x}): \mathbb{R}^{D} \rightarrow \mathbb{R} Eθ(x):RD→R It's an energy function , Z ( θ ) = ∫ x exp ( − E θ ( x ) ) Z(\theta)=\int_{\mathbf{x}} \exp \left(-E_{\theta}(\mathbf{x})\right) Z(θ)=∫xexp(−Eθ(x)) It's a partition function ( Don't worry about this ). To train this function, consider the method of optimizing log likelihood , Yes θ \theta θ Find gradient ( These are the two of this article loss One of ):
∂ log p θ ( x ) ∂ θ = E p θ ( x ′ ) [ ∂ E θ ( x ′ ) ∂ θ ] − ∂ E θ ( x ) ∂ θ (2) \frac{\partial \log p_{\theta}(\mathrm{x})}{\partial \theta}=\mathbb{E}_{p \theta\left(\mathrm{x}^{\prime}\right)}\left[\frac{\partial E_{\theta}\left(\mathrm{x}^{\prime}\right)}{\partial \theta}\right]-\frac{\partial E_{\theta}(\mathrm{x})}{\partial \theta} \tag{2} ∂θ∂logpθ(x)=Epθ(x′)[∂θ∂Eθ(x′)]−∂θ∂Eθ(x)(2)
What is more difficult is to start from p θ ( x ) p_{\theta}(x) pθ(x) In the sample , Early training EBM Use MCMC Method , In this paper, a new Stochastic Gradient Langevin Dynamics (SGLD):
x 0 ∼ p 0 ( x ) , x i + 1 = x i − α 2 ∂ E θ ( x i ) ∂ x i + ϵ , ϵ ∼ N ( 0 , α ) (3) \mathbf{x}_{0} \sim p_{0}(\mathbf{x}), \quad \mathbf{x}_{i+1}=\mathbf{x}_{i}-\frac{\alpha}{2} \frac{\partial E_{\theta}\left(\mathbf{x}_{i}\right)}{\partial \mathbf{x}_{i}}+\epsilon, \quad \epsilon \sim \mathcal{N}(0, \alpha)\tag{3} x0∼p0(x),xi+1=xi−2α∂xi∂Eθ(xi)+ϵ,ϵ∼N(0,α)(3)
This method and PGD Somewhat similar , The intuitive explanation here is sampling x x x Go to the place with low energy , One training sampling N N N Time . Recent work shows that SGLD The result of is close to the formula (2).
Proposed JEM
Consider one K K K Classification problem , f θ : R D → R K f_θ : R^D → R^K fθ:RD→RK, It can put every data point x ∈ R D x ∈ R^D x∈RD The mapping is called logit The real value of . Using so-called softmax Migration functions , You can put these logit Used to parameterize the class distribution :
p θ ( y ∣ x ) = exp ( f θ ( x ) [ y ] ) ∑ y ′ exp ( f θ ( x ) [ y ′ ] ) (4) p_{\theta}(y \mid \mathbf{x})=\frac{\exp \left(f_{\theta}(\mathbf{x})[y]\right)}{\sum_{y^{\prime}} \exp \left(f_{\theta}(\mathbf{x})\left[y^{\prime}\right]\right)} \tag{4} pθ(y∣x)=∑y′exp(fθ(x)[y′])exp(fθ(x)[y])(4)
among f θ ( x ) [ y ] f_{\theta}(x)[y] fθ(x)[y] Is the th of the network output vector k k k Weight . With these logit, There is no need to change the model , by x and y The joint distribution of defines an energy based model :
p θ ( x , y ) = exp ( f θ ( x ) [ y ] ) Z ( θ ) (5) p_{\theta}(\mathbf{x}, y)=\frac{\exp \left(f_{\theta}(\mathbf{x})[y]\right)}{Z(\theta)} \tag{5} pθ(x,y)=Z(θ)exp(fθ(x)[y])(5)
Through to y y y marginalized ( integral ), Or for x x x Get a non normalized density model :
p θ ( x ) = ∑ y p θ ( x , y ) = ∑ y exp ( f θ ( x ) [ y ] ) Z ( θ ) (6) p_{\theta}(\mathbf{x})=\sum_{y} p_{\theta}(\mathbf{x}, y)=\frac{\sum_{y} \exp \left(f_{\theta}(\mathbf{x})[y]\right)}{Z(\theta)}\tag{6} pθ(x)=y∑pθ(x,y)=Z(θ)∑yexp(fθ(x)[y])(6)
Some data x x x The energy of is :
E θ ( x ) = − log SumExp y ( f θ ( x ) [ y ] ) = − log ∑ y exp ( f θ ( x ) [ y ] ) (7) E_{\theta}(\mathbf{x})=-\log \operatorname{SumExp}_{y}\left(f_{\theta}(\mathbf{x})[y]\right)=-\log \sum_{y} \exp \left(f_{\theta}(\mathbf{x})[y]\right)\tag{7} Eθ(x)=−logSumExpy(fθ(x)[y])=−logy∑exp(fθ(x)[y])(7)
Define it to optimize our model , Our optimization goal is to maximize likelihood p ( x , y ) p(x,y) p(x,y), Break it down :
log p θ ( x , y ) = log p θ ( x ) + log p θ ( y ∣ x ) (8) \log p_{\theta}(\mathbf{x}, y)=\log p_{\theta}(\mathbf{x})+\log p_{\theta}(y \mid \mathbf{x}) \tag{8} logpθ(x,y)=logpθ(x)+logpθ(y∣x)(8)
Through the optimization of the last two items to achieve the optimization goal , log p θ ( y ∣ x ) \log p_{\theta}(y \mid \mathbf{x}) logpθ(y∣x) Optimization with standard cross entropy , log p θ ( x ) \log p_{\theta}(\mathbf{x}) logpθ(x) use SGLD Formula (2) Optimize .
The above is the proposed method , A useful formula is (2)(3)(8).
application
The hybrid model proposed in this paper can be classified , There are many other functions , Pick three main points :
Or generate the model
You can generate samples :

I had a long discussion with Jiaming , Combined with the code of the article , I guess from the formula (3) Pictures generated in , That is, the sampled pictures
OOD detection
There is an energy function , It is natural that it can be used for anomaly detection . It doesn't work E ( x ) E(x) E(x) To detect , Instead, it proposes an indicator :
s θ ( x ) = − ∥ ∂ log p θ ( x ) ∂ x ∥ 2 s_{\theta}(\mathbf{x})=-\left\|\frac{\partial \log p_{\theta}(\mathbf{x})}{\partial \mathbf{x}}\right\|_{2} sθ(x)=−∥∥∥∥∂x∂logpθ(x)∥∥∥∥2
effect :

Robustness
The formula (3) Of SGLD The process itself is very much like PGD, A lot of unreal samples were taken to participate in the training , It is also a matter of course to improve robustness .
边栏推荐
- 圣杯布局和双飞翼布局的区别
- C skill tree evaluation - customer first, making excellent products
- [php]tp6 cli mode to create tp6 and multi application configurations and common problems
- 关于solidity的delegatecall的坑
- Implement a timer: timer
- The journey of an operator in the framework of deep learning
- Dongjiao home development technical service
- 【M32】单片机 xxx.map 文件简单解读
- Introduction to 51 single chip microcomputer - 8x8 dot matrix LED
- Introduction to 51 Single Chip Microcomputer -- the use of Proteus 8 professional
猜你喜欢
![[5g NR] ng interface](/img/28/98e545104e4530d0e8f65e9ac993ca.png)
[5g NR] ng interface

What exactly is the open source office of a large factory like?

如何才能有效缓解焦虑?看看猿辅导怎么说

SQL injection vulnerability (XII) cookie injection

【Rust 日报】2022-01-23 WebAPI Benchmarking

Xh_CMS渗透测试文档

Flink core features and principles

OpenGL - Textures

OpenGL - Draw Triangle

Languo technology helps the ecological prosperity of openharmony
随机推荐
[CPU design practice] fundamentals of digital logic circuit design (I)
[rust daily] January 23, 2022 webapi benchmarking
关于solidity的delegatecall的坑
安装boost
Introduction to 51 single chip microcomputer - LED light
5G终端标识SUPI,SUCI及IMSI解析
代码的巨大进步
Blog add mailbox private message shortcut
PIP for source changing and accelerated downloading
[M32] simple interpretation of MCU code, RO data, RW data and Zi data
DL and alignment of spatially resolved single cell transcriptomes with Tangram
June 21, 2022: golang multiple choice question, what does the following golang code output? A:3; B:4; C:100; D: Compilation failed. package main import ( “fmt“ ) func
Introduction to 51 Single Chip Microcomputer -- digital clock
JDBC query result set, which is converted into a table
Keil c's switch statement causes the chip to not run normally
QT connect to Alibaba cloud using mqtt protocol
College entrance examination is a post station on the journey of life
仙人掌之歌——进军To C直播(1)
[openairinterface5g] high level module interface and ITTI entity thread creation
仙人掌之歌——进军To C直播(3)