当前位置:网站首页>Prototypical Networks for Few-shot Learning
Prototypical Networks for Few-shot Learning
2022-06-25 05:03:00 【MondayCat111】
Prototypical Networks for Few-shot Learning
)
Abstract
The prototype network proposed in this paper , The classifier can be well generalized to other new categories that have not appeared in the training set , Every new category , Give only a few samples . The prototype network learns a metric space , In this space , Classification can be performed by calculating the distance to the prototype representation of each class . Compared with other small samples , This method reflects a simpler inductive bias , It is beneficial to use in a limited range of data .
Introduction
A small sample is a task , The classifier must be adjusted to accommodate classes that have not been seen . You can classify unseen classes on unknown data sets , If you retrain the model on a new dataset , That leads to overfitting .
Matching net : In tagged datasets (support set) What you have learned is embedded in Using the attention mechanism To predict unmarked points (query set). The matching net can be interpreted as a weighted nearest neighbor classifier applied to the embedded space .Ravi and Larochelle Make further use of Episodes The idea of training and a meta learning method for small sample learning are proposed . Their methods are included in a given Eisode Training a LSTM To update a classifier , This will be well extended to the test set . ad locum ,LSTM Meta - learning is not in multiple Episodes Train a single model , But learning for everyone Episode Train a personalized model .
solve the problem : Small sample over fitting .
Method : Prototype network , Is based on being embedding Ideas , These points are represented around a single prototype of each class (prototype representation) Gather . To do that , We learned a nonlinear mapping , A neural network is used to map the input to a embedding space, And in embedding space The average value of each category in the set is supported as the prototype of the class . Then the embedded query points are classified by simply finding the nearest class prototype . Use the same method to deal with zero sample learning , ad locum , Each class has a given class height description (high-level description) Metadata , Instead of a small amount of tag data . So we learn to embed metadata into the shared space , As the prototype of each class .
Work done : In this paper , We give few-shot and zero-shot settings A prototype network has been developed . We are also with one-shot setting The matching network in , And the application of lower level distance function in the model is analyzed . Special , We associate prototype networks with clustering ,, To prove that when Bregman The divergence ( Such as square Euclidean distance ) When calculating the distance , The use of class averages as prototypes is correct . We find from experience that , The choice of distance is crucial , Because Euclidean distance is much better than the more commonly used cosine similarity .
Conclusion : Among several core tasks , We achieved state-of-the-art performance . The prototype network is simpler than the recent meta learning algorithm 、 More effective , Make it a small sample learning and zero sample learning method
Prototypical Networks
Model
The prototype network computes a M The representation of dimension ( Or prototype ) c k ∈ R M c_{k}\in\mathbb{R}^{M} ck∈RM, Each class passes through an embedded function f ϕ f_{\phi} fϕ( ϕ \phi ϕ Parameters can be learned ). Each prototype is the mean of the vectors of the embedded support points of the class : c k = 1 ∣ S k ∣ ∑ ( x i , y i ∈ S k ) f ϕ ( x i ) ( 1 ) c_{k}= \frac{1}{\left| S_{k} \right|}\sum_{\left(x_{i},y_{i}\in S_{k} \right)}f_{\phi}\left(x_{i} \right)(1) ck=∣Sk∣1(xi,yi∈Sk)∑fϕ(xi)(1)
In the embedded space, the prototype network generates a query based on softmax Distribution over distance :
p ϕ ( y = k ∣ x ) = e x p ( − d ( f ϕ ( x ) , c k ) ) ∑ k ′ e x p ( − d ( f ϕ ( x ) , c k ′ ) ) ( 2 ) p_{\phi}\left(y=k|x \right)= \frac{exp\left(-d\left(f_{\phi} \left( x \right) ,c_{k}\right) \right)}{\sum_{ {k}'} exp\left(-d\left(f_{\phi} \left( x \right) ,c_{ {k}'}\right) \right)} (2) pϕ(y=k∣x)=∑k′exp(−d(fϕ(x),ck′))exp(−d(fϕ(x),ck))(2)
adopt SGD To minimize the actual category x Negative log probability J ( ϕ ) = − l o g p ϕ ( y = k ∣ x ) J\left(\phi \right)=-log p_{\phi}\left(y=k |x \right) J(ϕ)=−logpϕ(y=k∣x) To carry out the learning process . A training set is a subset of a class randomly selected from the training set , Then select a sample set in each class as the support set , The rest of the subsets are formed as query points . The pseudo code to calculate the loss during training is as follows :
N: Number of samples in the training set
K: Number of classes in the training set
NC: Every Episode Number of categories in
NS: The number of samples supported in each class
NQ: Number of query samples in each class
RandomSample(S,N):denotes a set of N elements chosen uniformly at random from set S, without replacement.
Input : Training set D,Dk For categories k Subset
Output : Randomly generated training Episode The loss of J
The calculation process : by Episode Choose a category --> Select a support set --> Select training set –> Calculate the prototype of the support set –> Initialization loss –> Update loss
PrototypicalNetworksasMixtureDensityEstimation
For a special class of distance functions , As we know regular Bregman The divergence , The prototype function is equivalent to estimating the mixed density of the support set with exponential family density .
Any regular exponential distribution with parameters and cumulants can be uniquely correct Bregman Divergence means , For unmarked points z Distribution assignment of y The inference of is regular Bregman Divergence means . p ( y = k ∣ z ) = π k e x p ( − d φ ( z , μ ( θ k ) ) ) ∑ k ′ π k ′ e x p ( − d φ ( z , μ ( θ k ) ) ) p\left(y=k|z \right)=\frac{\pi_{k} exp\left(-d_{\varphi } \left(z,\mu \left (\theta_{k} \right) \right) \right)}{\sum_{ {k}'}\pi_{ {k}'}exp\left(-d_{\varphi }\left(z,\mu\left (\theta_{k} \right) \right) \right)} p(y=k∣z)=∑k′πk′exp(−dφ(z,μ(θk)))πkexp(−dφ(z,μ(θk)))
The prototype network performs mixed density estimation effectively , Its exponential distribution is d φ d_{\varphi } dφ determine . therefore , The choice of distance specifies modeling assumptions about the distribution of class conditional data in the embedded space .
ReinterpretationasaLinearModel
When using Euclidean distance d ( z , z ′ ) = ∥ z − z ′ ∥ 2 d\left( z,{z}'\right)={\left \| z-{z}'\right \|}^{2} d(z,z′)=∥z−z′∥2(2) The model can be expressed by linear model .
We mainly focus on the square Euclidean distance ( Corresponding to Gaussian distance ). It turns out that , Although Euclidean distance is equivalent to linear model, it is an effective choice . We assume that this is because all the required nonlinearity can be learned in the embedded function . actually , This is the method currently used by modern neural network classification systems .
Comparison to Matching Networks
The same thing : stay one-shot Equivalent under the situation .ck=xk, Because each class has only one support point , The two become equivalent
Difference : When using the square Euclidean distance , The prototype network produces a linear classifier . For a given support set , The matching network generates a weighted nearest neighbor classifier .
A natural question is whether it makes sense to use multiple prototypes per class rather than one . If the prototype of each class is fixed and greater than 1, This will require a partitioning scheme to further cluster the support points in each class .Mensinket al. and Rippel et al Et al. Have proposed , But both methods require a separate partitioning phase , This stage is separated from the weight update , Our method is easy to learn by using the ordinary gradient descent method .
Vinyals Et al. Proposed some extensions , Including embedded functions that separate support points and query points , Use the second level of complete conditional embedding (FCE), It takes into account every Episode Specific points of . They can also be incorporated into the prototype network , But they increase the number of learnable parameters , also FCE Use two-way LSTM Impose any order on the support set . contrary , We showed that it is possible to achieve the same level of performance using simple design choices , We will outline this below .
Design Choices
Distance metric : For prototype network and matching network , Any distance is allowed , And we find that using the square Euclidean distance can greatly improve the results of both . We speculate that this is mainly because the cosine distance is not Bregman The divergence , So the first 2.3 The equivalence of the mixed density estimation discussed in section is not tenable .
Episode composition : A simple construction Episodes Methods , Is the choice of each class Nc Classes and Ns Support points , In order to match the expected situation when testing . in other words , If we expect to execute a 5 One class, one sample , Training episodes It can be used Nc=5,Ns=1 form . However , We found that , Use a higher... Than when testing Nc or “way” It's very useful to train . In our experiment , We adjust our training Nc In a way that remains negative . Another consideration is that the training and testing times are matched Ns still ‘shot’. For prototype networks , We found it best to use the same ‘shot’ Training and testing .
Zero-Shot Learning
Zero-shot And few-shot The difference is that , Without the support set of training points , We give the class metadata for each class vk. These can be determined , You can also learn from the original text . Modify the prototype network to handle zero-shot The problem is simple , We define it as a separate embedding of metadata vectors . Figure 1 shows zero-shot,few-shot And the prototype network . Because metadata vectors and query points come from different input fields , We found that embedding prototypes g It is helpful to fix the unit length , But we do not restrict query embedding f.
Experiments


Conclusion
We propose a simple few-shot The method of learning is called prototype network , The basic idea is this , In a representation space learned by neural network, the average value of samples is used to represent each class . We use episode Training makes the neural network in few-shot I did very well in my study . This method is simpler and more effective than meta learning , Even if there is no matching network for complex expansion, it can produce the latest results ( Although these methods can also be applied to prototype networks ). We showed how to choose a distance metric by carefully considering , And by modifying Episode Learning process to greatly improve performance . We further show how to extend the prototype network to zero-shot setting, And in CUB-200 The latest results are implemented on the dataset . A natural direction for future work is to use Bregman Divergence , Not the square Euclidean distance , A class of conditional distributions corresponding to transcendental spherical Gauss . We have made a preliminary exploration , This includes learning the variance of each dimension for a class . This has not resulted in any experience gain , This shows that the embedded network itself has enough flexibility , Without additional fitting parameters for each class . in general , The simplicity and effectiveness of the prototype network make it a promising few-shot Learning methods .
My Thinking
advantage : The prototype of each class is averaged through the embedded space, and then the distance from the prediction point to the prototype is calculated to predict the class .
Improving direction :
① Classification model direction : Except for the square Euclidean distance ( It is equivalent to linear classification model but better than linear classification model ), Are there other measures that can improve accuracy ?
② Embed spatial orientation : Is there any other way of data enhancement to extract input features in a high degree of abstraction , After clustering , Better for unknown categories ?
Related translation :https://blog.csdn.net/Smiler_/article/details/103133876?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control
边栏推荐
- leetcode1221. Split balance string
- Detailed summary of float
- dotnet-exec 0.4.0 released
- On Transform
- In Net 6 using dotnet format formatting code
- Activereportsjs V3.0 comes on stage
- The print area becomes smaller after epplus copies the template
- Web3 DApp用户体验最佳实践
- Virtual honeypot Honeyd installation and deployment
- parallel recovery slave next change & parallel recovery push change
猜你喜欢

台式电脑连不上wifi怎么办

SOC验证环境的启动方式

Sleep more, you can lose weight. According to the latest research from the University of Chicago, sleeping more than 1 hour a day is equivalent to eating less than one fried chicken leg

Eyeshot 2022 Released

Flex flexible layout for mobile terminal page production

基于SSH实现的学生成绩管理系统

XSS (cross site script attack) summary (II)

ASEMI三相整流桥的工作原理

【图像融合】基于matlab方向离散余弦变换和主成分分析图像融合【含Matlab源码 1907期】

Virtual honeypot Honeyd installation and deployment
随机推荐
【Keil】ADuCM4050官方库的GPIO输出宏定义
Eyeshot 2022 Released
Compatible with Internet Explorer
Cookie & session & JSP (XII)
基于Cortex-M3、M4的精准延时(系统定时器SysTick延时,可用于STM32、ADuCM4050等)
A brief talk on media inquiry
Use js to simply implement the apply, call and bind methods
Construction scheme of distributed websocket
PHP uses JWT
[relax's law of life lying on the square] those poisonous chicken soup that seem to be too light and too heavy, but think carefully and fear
【FLink】access closed classloader classloader. check-leaked-classloader
JS, BOM, DOM (VI)
Filter & listener (XIV)
dotnet-exec 0.4.0 released
How to download and use Xiaobai one click reload on the official website
File upload vulnerability (III)
ThinkPHP 5 log management
Detailed summary of float
Codeforces Round #802 (Div. 2) C D
Integrate CDN to create the ultimate service experience for customers!