当前位置:网站首页>Learn from small samples and run to the sea of stars

Learn from small samples and run to the sea of stars

2022-06-26 05:04:00 Paddlepaddle

9720f5bd57c51bae5c54487c8816e56c.png

The theme of this speech is : Starting from small sample learning , Run to the sea of stars . It is mainly divided into five parts :

  • Small sample learning method and its importance

  • Three classic scenarios for small sample learning

  • Application fields of small sample learning

  • Definition and problems of small sample learning

  • PaddleFSL Help you achieve small sample learning

Wang Yaqing ,2019 He graduated from the Department of computer science and engineering of the Hong Kong University of science and technology , The research direction is machine learning , The tutors are professor nimingxuan and Professor guotianyou , It mainly focuses on small sample learning .

3f53b38fceb2061e389f22d53666e7bf.png

WAVE SUMMIT+2021 Deep learning developer Summit

【 Science and technology innovation , Female power 】 Forum

Since my doctoral study , There are many articles and achievements in ICML、NeurIPS、TheWebConf、EMNLP、TIP The top issue will be published at the top meeting . I have written a summary of small sample learning , yes ACM Computing Surveys 2019-2021 The highest citation paper in , It is also this year ESI Highly cited papers .

Besides , She is responsible for developing small sample learning tools , stay GitHub Get on 1.1K+ Focus on , If there are interested students , Take a look at this link :

https://github.com/tata1661/FSL-Mate/tree/master/PaddleFSL

Since Wang Yaqing joined Baidu , Deeply cultivate in the field of small sample learning , It is mainly about how to quickly generalize to new tasks that contain only a small amount of annotation data .

03ebba489c86049a59c945d84ea435fc.png

chart 1

Small sample learning method

And its importance

Solve small sample learning from three angles :

  • First Study the relevant theoretical learning foundation , For example, meta learning , Picture learning .

  • Secondly, we need to consider in Baidu How to put it into practice , For example, new drug discovery 、 Text classification 、 Intention recognition 、 Cold start is recommended 、 Gesture recognition, etc .

  • Finally, it is to help you Learn from small samples quickly , Realize the rapid prototyping of small sample learning method , A general small sample learning tool is also implemented . It is based on PaddlePaddle Developed by , It provides a simple, easy-to-use and stable , The classic method of small sample learning , It already contains CV and NLP Inside the classic application .

Speaking of small sample learning , Let's talk about deep learning first . since 2015 Since then , Deep learning has achieved many breakthroughs ,AlphaGo Defeated the human go champion . since ResNet Start , Machine learning model in ImageNet The annotation effect on such big data , Less error than human taggers . But the success of these deep learning models , In fact, it needs a lot of annotation data , And high-performance computing devices .

for instance AlphaGo, It is trained from a containing 3000 Wan Dui Yi's historical database , And you can keep playing against yourself .ResNet Training from ImageNet On , Such a rare , A large data set containing millions of labeled images . So it also makes , In most scenes , These two conditions “ A lot of annotation data ” and “ High performance computing devices ” It is hard to be satisfied , This is also the reason why small sample learning is required .

459ca601f9e0b3964a1bb9b79795c28c.png

chart 2

Three classic scenarios for small sample learning

First , Introduce the three classic scenarios of small sample learning .

1. In order to make artificial intelligence more human , Have the ability to draw inferences from one instance , In an effort to 3 Take the leftmost picture in as an example . Here is a unicycle for you , Even a child , It can also be easily from a pile of pictures , Identify which one is also a unicycle . Whether it's tilting the unicycle 、 Flip , Or make the poles thicker and the wheels bigger , It can still be seen that it is still a unicycle .

Besides , If I give you a unicycle 、 Bicycle 、 The motorcycle , Human children can easily see , Commonalities between different cars . such as , All have wheels 、 Car handle . This ability to draw inferences from one instance , Artificial intelligence is still missing . So small sample study , It has always been the focus of academic research , The goal is to reduce the gap between artificial intelligence and human intelligence .

d6fdba019b2cf4ab126327b5d4be3d52.png

chart 3

2. Key scenarios for small sample learning , To reduce data collection 、 mark 、 Process and calculate costs . Now , Many developers will encounter huge amounts of data without labels , And it contains a lot of noise . This also makes me really want to use these data to mine some knowledge 、 Information , Is a very difficult thing .

Generally speaking , It is necessary to find the personnel of data crowdsourcing , Help you mark data . But the standard data , First, it takes a long time , Multiple iterations are required between the two sides . The quality of the final data , It will still contain , Some subjective factors of the target data person .

So if we can apply small sample learning , You can collect data 、 Marked cost , It is greatly reduced . Only a small data set needs to be collected , This data set only needs to contain a small number of 、 High quality labeling samples , You can train a model , To do regression prediction and classification .

3. Deal with some rare situations . For example, dangerous 、 Involving privacy 、 Ethical . A classic scene , New drug discovery . In the discovery of new drugs , Hope to be able to extract from millions of compounds , Find the compounds that match the desired properties . For example, it has low toxicity , It has high water solubility .

But new drug discovery , Itself is a very time-consuming process . It may take more than ten years , It will cost a lot , Go and recruit some subjects for the test . But in the end , Samples that can really enter the laboratory for testing , The quantity is very small . This led to the discovery of new drugs , It is a small sample learning problem .( Pictured 3)

Application fields of small sample learning

Due to small sample learning , It's really too common , So at present, each line 、 All trades 、 All fields , Small sample learning has emerged . The earliest one is CV, Computer vision , Such as picture classification 、 Object recognition 、 Picture cutting .

Later on NLP The field has also emerged , For example, I will do some classic relationship extraction 、NER These tasks . Recently, with the emergence of pre training model , Everyone will want to use the pre training model . Because these pre training models , They are usually trained in a large corpus , It contains rich semantic information and prior knowledge .

How to fine tune or build some templates , It can be adjusted to some new tasks , Even if it contains only a small amount of dimension data , This is also the latest NLP Research focus in this field .

except NLP field , There are also knowledge maps , For example, how to deal with the emerging new entities 、 A new relationship , This can be done through small sample learning .

dbd804c26d6dbc2a416a48c085b49ad4.png

chart 4

In addition, we have just mentioned , New drug discovery and robotics . for instance , Teach the robot dog to take two steps to the left , Or just show oneortwo gestures , It knows what I want to do , This is all about learning with small samples .

Definition and problems of small sample learning

The following gives a more rigorous definition of small sample learning , It's based on 1997 year Tom Mitchell Professor's classical machine learning definition .

What is machine learning ? For a certain type of task T, If a computer program , In this task T Upper and P Measured performance , With experience E Increased by , Just call the computer program , From experience E Learn from .

Small sample learning , It is a kind of machine learning . But something special is , The experience in it , There are only a few supervisory signals . Common supervisory signals , Is the label of the sample .

6d18872a757d844d8a8f84f9998daa3f.png

chart 5

The ideal of learning , The hope is to reduce the expected risk of the model . That is, no matter what kind of samples there are in the future , Can well predict . But the joint distribution of this model , Generally unknown , So we must estimate it .

In machine learning , Generally, it is to optimize the experience risk . however , You can see the empirical risk in the formula , It is determined by the number of samples in the training set . If it is a training sample , Only a small amount of annotation data . This I If the quantity is very small , You'll end up with very unreliable , An empirical valuation of minimized risk , It is really a difficult problem to make small samples learn .

however , This is not something that cannot be solved , The solution is that we will experience E Annotation information in , Combined with some prior knowledge . for instance , Just mentioned NLP Pre training model of domain , After combining these prior knowledge , Can make the task T Learning becomes feasible . There are generally three angles .

  1. Through these prior knowledge , To generate more annotation samples , Used for training .

  2. Through prior knowledge , Limit the spatial complexity of the model .

  3. There can also be such a priori knowledge , Let it tell us , How to design an economical search strategy . for instance , In the hypothetical space, this big H above , Which point should I start searching ? Which way to search ? At what speed ? these , Will make the final search strategy , It can be more economical and effective . Only a few samples , You can get good results .

These methods , Are summarized and combed in detail in the summary of small sample learning . This is a ACM Computing Surveys The highest cited papers in the last two years , It's also ESI This year's highly cited papers .

PaddleFSL

Help you achieve small sample learning

I just introduced , General small sample learning method . Let's introduce , How to learn the toolkit through small samples PaddleFSL, To achieve small sample learning .

fcc43f4e96b28ed66b1f41e91371b4a2.png

chart 6

PaddleFSL It's based on Flying propeller Small sample learning kit . In this kit , Provides simplicity 、 Easy to use 、 Stable classical small sample learning method , And support the development of new small sample learning methods .

Besides , It also provides unified data set processing , Make the model effect easier . It also provides very detailed notes , Allows you to easily customize new datasets . It already contains ,CV and NLP Classic application of small samples , And rely on Flying propeller The prosperity of Ecology , Continue to expand to new areas .

From here PaddleFSL On the overall frame diagram of , Can be seen in support of things like image classification 、 Relationship extraction 、 A series of tasks such as general natural language processing . And includes these three tasks , Some classical datasets involved .

To handle different applications , Different feature extractors are also available , For you to extract features .

such as CNN It is used to draw pictures , It also supports all PaddleNLP The pre training model provided inside . Besides , The model base also provides a classic small sample learning method . because PaddleFSL It's deployed in Flying propeller above , Therefore, it also supports cross platform deployment .

Here is the reproduction of the classification results of small sample pictures . Use PaddleFSL stay ProtoNet、RelationNet、MAML、ANIL These four methods are above , stay Omniglot、Mini-ImageNet Two classic datasets , Can be reproduced better than the article report , Or at least a comparable effect .

Here's a summary , Since joining Baidu Research Institute , Wangyaqing is mainly doing small sample learning . In terms of theoretical research , The article is now also ACM Computing Surveys, also WWW Employment . Besides , In the practical application of small samples , Especially new drug discovery , This year NeurIPS 2021 Received as Spotlight Paper. And small samples of short text classification articles , By EMNLP Received as long text . Intention recognition and cold start are also advancing , At present, it is in the draft review stage .

In addition, the work on small sample gesture recognition , It has obtained the general project support of the National Natural Science Foundation of China . Last but not least PaddleFSL, This package now has 1100 Much of the Star, as well as 1 Read more than ten thousand articles .

Take this opportunity to , I hope that students interested in small sample learning can scan the QR code below to learn more , And carry out cutting-edge research and practice together .

225c0b40f6954bdf1de0272dd23969fa.png

chart 7

Related to recommend

640da3e1db1bee3614136d30faf51770.gif

Official account , Get more technical content ~

This article is shared in Blog “ Flying propeller PaddlePaddle”(CSDN).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .

原网站

版权声明
本文为[Paddlepaddle]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180507173118.html