当前位置:网站首页>Meta learning (meta learning and small sample learning)
Meta learning (meta learning and small sample learning)
2022-07-25 12:02:00 【Shangshanxianger】
Meta-learning( Meta learning )
Although at present, many models of violent heap computing force heap data have achieved good results , But because some data are difficult to collect , Or a large number of labels consume too much manpower , There are also many researches on meta learning . In particular, for example, humans often only need a small amount of data to achieve rapid learning , You don't even need data to complete reasoning just by concept . This ability basically belongs to the category of meta learning , Or in the field of machine learning zero-shot,few-shot learning 了 , First, let's look at the concept :
- Meta-learning, Meta learning . Learn how to learn . So-called “ element ” It corresponds to the understanding of the basic knowledge and behavior patterns of the world that human beings have mastered in early childhood , That is, an initial network with strong generalization , Plus a learning ability to quickly adapt to new tasks . So the goal of meta learning is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation .
At present, meta learning mainly aims at small sample learning problems , The training and testing of the original learning are based on a small sample task , Each task has its own training data set and test data set , It also becomes a support set and a query set , It only uses small sample data in the training and testing stages .
- Zore-shot learning, Zero sample learning . The support set is the training set , It is labeled seen classes, The query set is the test set, that is unseen classes, Zero sample learning will identify the relevant knowledge between each class you haven't seen and the class you've seen . That is to say , If we know what a horse looks like , Know that zebras look like horses and have stripes , Then we can recognize zebras even if we haven't seen them .
- One-shot learning, Homogeneous learning . That is, when there is only one sample of the newly unseen category , It is hoped that the model can predict new categories through the old categories that have been learned . At this time meta-learing It's not like traditional supervised learning , In order to summarize the shared information and patterns existing in the distribution of a class , Instead, try to learn the rules that exist in the distribution of tasks ( That is, how to learn ).
- Few-shot learning, Study with fewer samples . Machine learning model after learning a large amount of data of a certain category , New unseen classes can be learned quickly with a small number of samples .
- C-way K-shot problem . There are many categories in the training set , elect C Categories , Select from each category K Samples , As a support set , Again from C Extract the remaining from the categories batch As test set .
The main goal is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation . Tube and its research methods can be divided into five directions :
- Measure based learning . A metric is a function of the distance between two elements , Also called the distance function , So measurement learning is also called similarity learning , It refers to calculating the distance between two samples through a given distance function , So as to measure their similarity . Therefore, the methods in this direction are mainly based on the similarity score given by the measurement module to predict the category .
- Initialization based on strong generalization . Mainly MAML Model (Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks), It classifies the task data of gradient descent and loss optimization , Thus, it can have better generalization ability .
- Based on optimizer . It is generally believed that gradient based optimization methods need many steps of iteration to be better , The method based on gradient optimization mainly uses the method of optimizing gradient descent to train the network , Such as introduction LSTM.
- Based on additional external storage . It's easy to fit with neural network only , So this method mainly uses external storage to update . Then adjust its structure and parameter space in time according to the feedback signal , And then it can improve performance in the new environment by accumulating experience .
- Based on data enhancement methods . Generate virtual data to provide additional training signals for the model . For example, in N-way In the classification task , Add an extra dimension to N+1 That is, whether it is a false data set .
Next, sort out some representative measurement learning articles :
Siamese Neural Networks
Twin neural network is a model of similarity measurement , When the number of categories is large but the number of samples in each category is small, it can be used for category identification . The main idea is to map the input to the target space through the embedded function , Use a simple distance function to calculate the similarity , Then minimize a pair of samples of the same category in the training phase pair At the same time, maximize the loss of a pair of samples of different categories pair The loss of .
The model results are shown in the figure above , First use cnn The extracted features Embedding, Then calculate whether the two inputs of distance and final prediction probability are same class, The same kind is 1, Different for 0, The loss calculation is cross entropy . It is called twin because two twin neural networks share a set of parameters and weights ( That is, a cnn To extract features ).
And then look at it test How to do it in the stage , For example, for one-shot Come on , Because there is only one sample in each category in the training set , Therefore, each image in the test set and each sample in the training set form a sample pair , Input into the twin neural network in turn , Get the distance of each pair of samples , Select the label of the training sample with the smallest distance as the category of the test sample , So as to complete the classification .

Match Network
It is also measurement learning , What is different from the above is that it has changed from one-to-one to one to many , As shown in the above figure, the input becomes multiple samples with categories and samples without categories . Therefore, the purpose of this network is to map small sample data with labels and samples without labels to the corresponding labels . The network used is also CNN, Then the new sample will calculate the similarity with each vector and finally get the score .

Prototypical Networks
Similarly, there are prototype Networks (Prototypical Networks), similar k-means, The author thinks that every category has a prototype in vector space , That is, the center point of the category , Therefore, for the mapped samples, calculate the average value to the prototype of a certain category , Training loss Make similar samples close to , Different samples are far away from . But too little sample size will lead to classification boundary deviation , All can use semi supervised ideas to make some improvements :
- All unlabeled data belong to the category of labeled data , Calculate the new prototype with unlabeled data and labeled data .
- Unlabeled data either belongs to the category of labeled data , Or belong to another class —— Interference class (distractor class). The interference class starts with the origin (0,0) As a prototype , The model learns the radius of interference class .
- Unlabeled data either belongs to a known category , Or be covered up (masked).

Relation Networks
Pictured above , Follow Match Network It's like , But used relation module To make sure that metric The way to calculate the distance , The model is divided into two modules : Embedded module and relationship module , The embedded f Still CNN, Relationship module g It is a similarity comparison module , Use ReLU To calculate the similarity , Used to output the similarity score of two samples .
- Zero sample learning . Use the semantic features of each category to embed vectors v, Use the new embedded function f, Get the feature map of this category .
- Single sample learning . Each category has a support set sample x, The embedded vector is f(x), Then for the query sample y, Also get embedded vectors f(y), Finally by C(f(x),f(y)) Represents the connection of two vectors , Put the connected vector into the relationship module g in , Get the similarity score , Completing classification .
- Small sample learning . For each class of support set samples , Add their embedded vectors as the feature mapping of the whole category , The rest of the process is the same as single sample learning .

The next blog post sorts out its application :
Zero-Shot Image Retrieval( Zero sample cross modal retrieval )
边栏推荐
- PHP 上传ftp路径文件到外网服务器上 curl base64图片
- Innovation and breakthrough! AsiaInfo technology helped a province of China Mobile complete the independent and controllable transformation of its core accounting database
- Web APIs(获取元素 事件基础 操作元素)
- 【云驻共创】AI在数学界有哪些作用?未来对数学界会有哪些颠覆性影响?
- session和cookie有什么区别??小白来告诉你
- 【6篇文章串讲ScalableGNN】围绕WWW 2022 best paper《PaSca》
- 【USB设备设计】--复合设备,双HID高速(64Byte 和 1024Byte)
- brpc源码解析(三)—— 请求其他服务器以及往socket写数据的机制
- 【RS采样】A Gain-Tuning Dynamic Negative Sampler for Recommendation (WWW 2022)
- JS数据类型以及相互转换
猜你喜欢

Multi-Label Image Classification(多标签图像分类)

Start with the development of wechat official account

【GCN-RS】Learning Explicit User Interest Boundary for Recommendation (WWW‘22)

Learning to Pre-train Graph Neural Networks(图预训练与微调差异)

浅谈低代码技术在物流管理中的应用与创新

【AI4Code】《Pythia: AI-assisted Code Completion System》(KDD 2019)

Brpc source code analysis (V) -- detailed explanation of basic resource pool

GPT plus money (OpenAI CLIP,DALL-E)

Brpc source code analysis (VIII) -- detailed explanation of the basic class eventdispatcher
![[imx6ull notes] - a preliminary exploration of the underlying driver of the kernel](/img/0f/a0139be99c61fde08e73a5be6d6b4c.png)
[imx6ull notes] - a preliminary exploration of the underlying driver of the kernel
随机推荐
Oil monkey script link
dirReader.readEntries 兼容性问题 。异常错误DOMException
[cloud co creation] what is the role of AI in mathematics? What will be the disruptive impact on the mathematical world in the future?
Solutions to the failure of winddowns planning task execution bat to execute PHP files
【MySQL 17】安装异常:Could not open file ‘/var/log/mysql/mysqld.log‘ for error logging: Permission denied
Learning to Pre-train Graph Neural Networks(图预训练与微调差异)
JS流程控制
【GCN-RS】Towards Representation Alignment and Uniformity in Collaborative Filtering (KDD‘22)
【AI4Code】CodeX:《Evaluating Large Language Models Trained on Code》(OpenAI)
How to solve the problem that "w5500 chip cannot connect to the server immediately after power failure and restart in tcp_client mode"
【GCN-RS】MCL: Mixed-Centric Loss for Collaborative Filtering (WWW‘22)
【高并发】我用10张图总结出了这份并发编程最佳学习路线!!(建议收藏)
Brpc source code analysis (VIII) -- detailed explanation of the basic class eventdispatcher
【AI4Code】《IntelliCode Compose: Code Generation using Transformer》 ESEC/FSE 2020
MySQL historical data supplement new data
Teach you how to configure S2E to UDP working mode through MCU
Qin long, a technical expert of Alibaba cloud: a prerequisite for reliability assurance - how to carry out chaos engineering on the cloud
【GCN-RS】Region or Global? A Principle for Negative Sampling in Graph-based Recommendation (TKDE‘22)
[USB device design] - composite device, dual hid high-speed (64BYTE and 1024byte)
Intelligent information retrieval(智能信息检索综述)