当前位置:网站首页>Meta learning (meta learning and small sample learning)
Meta learning (meta learning and small sample learning)
2022-07-25 12:02:00 【Shangshanxianger】
Meta-learning( Meta learning )
Although at present, many models of violent heap computing force heap data have achieved good results , But because some data are difficult to collect , Or a large number of labels consume too much manpower , There are also many researches on meta learning . In particular, for example, humans often only need a small amount of data to achieve rapid learning , You don't even need data to complete reasoning just by concept . This ability basically belongs to the category of meta learning , Or in the field of machine learning zero-shot,few-shot learning 了 , First, let's look at the concept :
- Meta-learning, Meta learning . Learn how to learn . So-called “ element ” It corresponds to the understanding of the basic knowledge and behavior patterns of the world that human beings have mastered in early childhood , That is, an initial network with strong generalization , Plus a learning ability to quickly adapt to new tasks . So the goal of meta learning is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation .
At present, meta learning mainly aims at small sample learning problems , The training and testing of the original learning are based on a small sample task , Each task has its own training data set and test data set , It also becomes a support set and a query set , It only uses small sample data in the training and testing stages .
- Zore-shot learning, Zero sample learning . The support set is the training set , It is labeled seen classes, The query set is the test set, that is unseen classes, Zero sample learning will identify the relevant knowledge between each class you haven't seen and the class you've seen . That is to say , If we know what a horse looks like , Know that zebras look like horses and have stripes , Then we can recognize zebras even if we haven't seen them .
- One-shot learning, Homogeneous learning . That is, when there is only one sample of the newly unseen category , It is hoped that the model can predict new categories through the old categories that have been learned . At this time meta-learing It's not like traditional supervised learning , In order to summarize the shared information and patterns existing in the distribution of a class , Instead, try to learn the rules that exist in the distribution of tasks ( That is, how to learn ).
- Few-shot learning, Study with fewer samples . Machine learning model after learning a large amount of data of a certain category , New unseen classes can be learned quickly with a small number of samples .
- C-way K-shot problem . There are many categories in the training set , elect C Categories , Select from each category K Samples , As a support set , Again from C Extract the remaining from the categories batch As test set .
The main goal is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation . Tube and its research methods can be divided into five directions :
- Measure based learning . A metric is a function of the distance between two elements , Also called the distance function , So measurement learning is also called similarity learning , It refers to calculating the distance between two samples through a given distance function , So as to measure their similarity . Therefore, the methods in this direction are mainly based on the similarity score given by the measurement module to predict the category .
- Initialization based on strong generalization . Mainly MAML Model (Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks), It classifies the task data of gradient descent and loss optimization , Thus, it can have better generalization ability .
- Based on optimizer . It is generally believed that gradient based optimization methods need many steps of iteration to be better , The method based on gradient optimization mainly uses the method of optimizing gradient descent to train the network , Such as introduction LSTM.
- Based on additional external storage . It's easy to fit with neural network only , So this method mainly uses external storage to update . Then adjust its structure and parameter space in time according to the feedback signal , And then it can improve performance in the new environment by accumulating experience .
- Based on data enhancement methods . Generate virtual data to provide additional training signals for the model . For example, in N-way In the classification task , Add an extra dimension to N+1 That is, whether it is a false data set .
Next, sort out some representative measurement learning articles :
Siamese Neural Networks
Twin neural network is a model of similarity measurement , When the number of categories is large but the number of samples in each category is small, it can be used for category identification . The main idea is to map the input to the target space through the embedded function , Use a simple distance function to calculate the similarity , Then minimize a pair of samples of the same category in the training phase pair At the same time, maximize the loss of a pair of samples of different categories pair The loss of .
The model results are shown in the figure above , First use cnn The extracted features Embedding, Then calculate whether the two inputs of distance and final prediction probability are same class, The same kind is 1, Different for 0, The loss calculation is cross entropy . It is called twin because two twin neural networks share a set of parameters and weights ( That is, a cnn To extract features ).
And then look at it test How to do it in the stage , For example, for one-shot Come on , Because there is only one sample in each category in the training set , Therefore, each image in the test set and each sample in the training set form a sample pair , Input into the twin neural network in turn , Get the distance of each pair of samples , Select the label of the training sample with the smallest distance as the category of the test sample , So as to complete the classification .

Match Network
It is also measurement learning , What is different from the above is that it has changed from one-to-one to one to many , As shown in the above figure, the input becomes multiple samples with categories and samples without categories . Therefore, the purpose of this network is to map small sample data with labels and samples without labels to the corresponding labels . The network used is also CNN, Then the new sample will calculate the similarity with each vector and finally get the score .

Prototypical Networks
Similarly, there are prototype Networks (Prototypical Networks), similar k-means, The author thinks that every category has a prototype in vector space , That is, the center point of the category , Therefore, for the mapped samples, calculate the average value to the prototype of a certain category , Training loss Make similar samples close to , Different samples are far away from . But too little sample size will lead to classification boundary deviation , All can use semi supervised ideas to make some improvements :
- All unlabeled data belong to the category of labeled data , Calculate the new prototype with unlabeled data and labeled data .
- Unlabeled data either belongs to the category of labeled data , Or belong to another class —— Interference class (distractor class). The interference class starts with the origin (0,0) As a prototype , The model learns the radius of interference class .
- Unlabeled data either belongs to a known category , Or be covered up (masked).

Relation Networks
Pictured above , Follow Match Network It's like , But used relation module To make sure that metric The way to calculate the distance , The model is divided into two modules : Embedded module and relationship module , The embedded f Still CNN, Relationship module g It is a similarity comparison module , Use ReLU To calculate the similarity , Used to output the similarity score of two samples .
- Zero sample learning . Use the semantic features of each category to embed vectors v, Use the new embedded function f, Get the feature map of this category .
- Single sample learning . Each category has a support set sample x, The embedded vector is f(x), Then for the query sample y, Also get embedded vectors f(y), Finally by C(f(x),f(y)) Represents the connection of two vectors , Put the connected vector into the relationship module g in , Get the similarity score , Completing classification .
- Small sample learning . For each class of support set samples , Add their embedded vectors as the feature mapping of the whole category , The rest of the process is the same as single sample learning .

The next blog post sorts out its application :
Zero-Shot Image Retrieval( Zero sample cross modal retrieval )
边栏推荐
- Brpc source code analysis (I) -- the main process of RPC service addition and server startup
- LeetCode 50. Pow(x,n)
- [cloud co creation] what is the role of AI in mathematics? What will be the disruptive impact on the mathematical world in the future?
- 擎创科技加入龙蜥社区,共建智能运维平台新生态
- PHP uploads the FTP path file to the curl Base64 image on the Internet server
- brpc源码解析(八)—— 基础类EventDispatcher详解
- Javescript loop
- 【6篇文章串讲ScalableGNN】围绕WWW 2022 best paper《PaSca》
- Transformer变体(Routing Transformer,Linformer,Big Bird)
- Experimental reproduction of image classification (reasoning only) based on caffe resnet-50 network
猜你喜欢

Pycharm connects to the remote server SSH -u reports an error: no such file or directory

OSPF综合实验

Brpc source code analysis (VI) -- detailed explanation of basic socket

return 和 finally的执行顺序 ?各位大佬请看过来,

dirReader.readEntries 兼容性问题 。异常错误DOMException
![[MySQL learning 09]](/img/27/2578f320789ed32552d6f69f14a151.png)
[MySQL learning 09]

Go 垃圾回收器指南

Innovation and breakthrough! AsiaInfo technology helped a province of China Mobile complete the independent and controllable transformation of its core accounting database

Power Bi -- these skills make the report more "compelling"“

Learning to Pre-train Graph Neural Networks(图预训练与微调差异)
随机推荐
W5500 multi node connection
PL/SQL入门,非常详细的笔记
Teach you how to configure S2E to UDP working mode through MCU
brpc源码解析(八)—— 基础类EventDispatcher详解
There is no sound output problem in the headphone jack on the front panel of MSI motherboard [solved]
JS 面试题:手写节流(throttle)函数
【RS采样】A Gain-Tuning Dynamic Negative Sampler for Recommendation (WWW 2022)
PHP curl post x-www-form-urlencoded
Return and finally? Everyone, please look over here,
GPT plus money (OpenAI CLIP,DALL-E)
【高并发】我用10张图总结出了这份并发编程最佳学习路线!!(建议收藏)
'C:\xampp\php\ext\php_zip.dll' - %1 不是有效的 Win32 应用程序 解决
【GCN-RS】MCL: Mixed-Centric Loss for Collaborative Filtering (WWW‘22)
W5500 adjusts the brightness of LED light band through upper computer control
30 sets of Chinese style ppt/ creative ppt templates
W5500 is in TCP_ In server mode, you cannot Ping or communicate in the switch / router network.
PHP uploads the FTP path file to the curl Base64 image on the Internet server
brpc源码解析(四)—— Bthread机制
[MySQL 17] installation exception: could not open file '/var/log/mysql/mysqld log‘ for error logging: Permission denied
Transformer变体(Routing Transformer,Linformer,Big Bird)