当前位置：网站首页>Meta learning (meta learning and small sample learning)

Meta learning (meta learning and small sample learning)

2022-07-25 12:02:00 【Shangshanxianger】

Meta-learning（ Meta learning ）

Although at present, many models of violent heap computing force heap data have achieved good results , But because some data are difficult to collect , Or a large number of labels consume too much manpower , There are also many researches on meta learning . In particular, for example, humans often only need a small amount of data to achieve rapid learning , You don't even need data to complete reasoning just by concept . This ability basically belongs to the category of meta learning , Or in the field of machine learning zero-shot,few-shot learning 了 , First, let's look at the concept ：

Meta-learning, Meta learning . Learn how to learn . So-called “ element ” It corresponds to the understanding of the basic knowledge and behavior patterns of the world that human beings have mastered in early childhood , That is, an initial network with strong generalization , Plus a learning ability to quickly adapt to new tasks . So the goal of meta learning is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation .

At present, meta learning mainly aims at small sample learning problems , The training and testing of the original learning are based on a small sample task , Each task has its own training data set and test data set , It also becomes a support set and a query set , It only uses small sample data in the training and testing stages .

Zore-shot learning, Zero sample learning . The support set is the training set , It is labeled seen classes, The query set is the test set, that is unseen classes, Zero sample learning will identify the relevant knowledge between each class you haven't seen and the class you've seen . That is to say , If we know what a horse looks like , Know that zebras look like horses and have stripes , Then we can recognize zebras even if we haven't seen them .
One-shot learning, Homogeneous learning . That is, when there is only one sample of the newly unseen category , It is hoped that the model can predict new categories through the old categories that have been learned . At this time meta-learing It's not like traditional supervised learning , In order to summarize the shared information and patterns existing in the distribution of a class , Instead, try to learn the rules that exist in the distribution of tasks （ That is, how to learn ）.
Few-shot learning, Study with fewer samples . Machine learning model after learning a large amount of data of a certain category , New unseen classes can be learned quickly with a small number of samples .
C-way K-shot problem . There are many categories in the training set , elect C Categories , Select from each category K Samples , As a support set , Again from C Extract the remaining from the categories batch As test set .

The main goal is to improve generalization ability , Get good parameters , Good results can be achieved through a small amount of calculation . Tube and its research methods can be divided into five directions ：

Measure based learning . A metric is a function of the distance between two elements , Also called the distance function , So measurement learning is also called similarity learning , It refers to calculating the distance between two samples through a given distance function , So as to measure their similarity . Therefore, the methods in this direction are mainly based on the similarity score given by the measurement module to predict the category .
Initialization based on strong generalization . Mainly MAML Model （Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks）, It classifies the task data of gradient descent and loss optimization , Thus, it can have better generalization ability .
Based on optimizer . It is generally believed that gradient based optimization methods need many steps of iteration to be better , The method based on gradient optimization mainly uses the method of optimizing gradient descent to train the network , Such as introduction LSTM.
Based on additional external storage . It's easy to fit with neural network only , So this method mainly uses external storage to update . Then adjust its structure and parameter space in time according to the feedback signal , And then it can improve performance in the new environment by accumulating experience .
Based on data enhancement methods . Generate virtual data to provide additional training signals for the model . For example, in N-way In the classification task , Add an extra dimension to N+1 That is, whether it is a false data set .

Next, sort out some representative measurement learning articles ：
Insert picture description here
Siamese Neural Networks
Twin neural network is a model of similarity measurement , When the number of categories is large but the number of samples in each category is small, it can be used for category identification . The main idea is to map the input to the target space through the embedded function , Use a simple distance function to calculate the similarity , Then minimize a pair of samples of the same category in the training phase pair At the same time, maximize the loss of a pair of samples of different categories pair The loss of .

The model results are shown in the figure above , First use cnn The extracted features Embedding, Then calculate whether the two inputs of distance and final prediction probability are same class, The same kind is 1, Different for 0, The loss calculation is cross entropy . It is called twin because two twin neural networks share a set of parameters and weights （ That is, a cnn To extract features ）.

And then look at it test How to do it in the stage , For example, for one-shot Come on , Because there is only one sample in each category in the training set , Therefore, each image in the test set and each sample in the training set form a sample pair , Input into the twin neural network in turn , Get the distance of each pair of samples , Select the label of the training sample with the smallest distance as the category of the test sample , So as to complete the classification .

Insert picture description here
Match Network
It is also measurement learning , What is different from the above is that it has changed from one-to-one to one to many , As shown in the above figure, the input becomes multiple samples with categories and samples without categories . Therefore, the purpose of this network is to map small sample data with labels and samples without labels to the corresponding labels . The network used is also CNN, Then the new sample will calculate the similarity with each vector and finally get the score .

Insert picture description here
Prototypical Networks
Similarly, there are prototype Networks （Prototypical Networks）, similar k-means, The author thinks that every category has a prototype in vector space , That is, the center point of the category , Therefore, for the mapped samples, calculate the average value to the prototype of a certain category , Training loss Make similar samples close to , Different samples are far away from . But too little sample size will lead to classification boundary deviation , All can use semi supervised ideas to make some improvements ：

All unlabeled data belong to the category of labeled data , Calculate the new prototype with unlabeled data and labeled data .
Unlabeled data either belongs to the category of labeled data , Or belong to another class —— Interference class (distractor class). The interference class starts with the origin (0,0) As a prototype , The model learns the radius of interference class .
Unlabeled data either belongs to a known category , Or be covered up (masked).

Insert picture description here
Relation Networks
Pictured above , Follow Match Network It's like , But used relation module To make sure that metric The way to calculate the distance , The model is divided into two modules ： Embedded module and relationship module , The embedded f Still CNN, Relationship module g It is a similarity comparison module , Use ReLU To calculate the similarity , Used to output the similarity score of two samples .

Zero sample learning . Use the semantic features of each category to embed vectors v, Use the new embedded function f, Get the feature map of this category .
Single sample learning . Each category has a support set sample x, The embedded vector is f(x), Then for the query sample y, Also get embedded vectors f(y), Finally by C(f(x),f(y)) Represents the connection of two vectors , Put the connected vector into the relationship module g in , Get the similarity score , Completing classification .
Small sample learning . For each class of support set samples , Add their embedded vectors as the feature mapping of the whole category , The rest of the process is the same as single sample learning .

Insert picture description here
The next blog post sorts out its application ：
Zero-Shot Image Retrieval（ Zero sample cross modal retrieval ）

原网站

版权声明
本文为[Shangshanxianger]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251108384031.html

当前位置：网站首页>Meta learning (meta learning and small sample learning)

Meta learning (meta learning and small sample learning)

边栏推荐

猜你喜欢

随机推荐