当前位置:网站首页>Learn from small samples and run to the sea of stars
Learn from small samples and run to the sea of stars
2022-06-26 05:04:00 【Paddlepaddle】

The theme of this speech is : Starting from small sample learning , Run to the sea of stars . It is mainly divided into five parts :
Small sample learning method and its importance
Three classic scenarios for small sample learning
Application fields of small sample learning
Definition and problems of small sample learning
PaddleFSL Help you achieve small sample learning
Wang Yaqing ,2019 He graduated from the Department of computer science and engineering of the Hong Kong University of science and technology , The research direction is machine learning , The tutors are professor nimingxuan and Professor guotianyou , It mainly focuses on small sample learning .

WAVE SUMMIT+2021 Deep learning developer Summit
【 Science and technology innovation , Female power 】 Forum
Since my doctoral study , There are many articles and achievements in ICML、NeurIPS、TheWebConf、EMNLP、TIP The top issue will be published at the top meeting . I have written a summary of small sample learning , yes ACM Computing Surveys 2019-2021 The highest citation paper in , It is also this year ESI Highly cited papers .
Besides , She is responsible for developing small sample learning tools , stay GitHub Get on 1.1K+ Focus on , If there are interested students , Take a look at this link :
https://github.com/tata1661/FSL-Mate/tree/master/PaddleFSL
Since Wang Yaqing joined Baidu , Deeply cultivate in the field of small sample learning , It is mainly about how to quickly generalize to new tasks that contain only a small amount of annotation data .

chart 1
Small sample learning method
And its importance
Solve small sample learning from three angles :
First Study the relevant theoretical learning foundation , For example, meta learning , Picture learning .
Secondly, we need to consider in Baidu How to put it into practice , For example, new drug discovery 、 Text classification 、 Intention recognition 、 Cold start is recommended 、 Gesture recognition, etc .
Finally, it is to help you Learn from small samples quickly , Realize the rapid prototyping of small sample learning method , A general small sample learning tool is also implemented . It is based on PaddlePaddle Developed by , It provides a simple, easy-to-use and stable , The classic method of small sample learning , It already contains CV and NLP Inside the classic application .
Speaking of small sample learning , Let's talk about deep learning first . since 2015 Since then , Deep learning has achieved many breakthroughs ,AlphaGo Defeated the human go champion . since ResNet Start , Machine learning model in ImageNet The annotation effect on such big data , Less error than human taggers . But the success of these deep learning models , In fact, it needs a lot of annotation data , And high-performance computing devices .
for instance AlphaGo, It is trained from a containing 3000 Wan Dui Yi's historical database , And you can keep playing against yourself .ResNet Training from ImageNet On , Such a rare , A large data set containing millions of labeled images . So it also makes , In most scenes , These two conditions “ A lot of annotation data ” and “ High performance computing devices ” It is hard to be satisfied , This is also the reason why small sample learning is required .

chart 2
Three classic scenarios for small sample learning
First , Introduce the three classic scenarios of small sample learning .
1. In order to make artificial intelligence more human , Have the ability to draw inferences from one instance , In an effort to 3 Take the leftmost picture in as an example . Here is a unicycle for you , Even a child , It can also be easily from a pile of pictures , Identify which one is also a unicycle . Whether it's tilting the unicycle 、 Flip , Or make the poles thicker and the wheels bigger , It can still be seen that it is still a unicycle .
Besides , If I give you a unicycle 、 Bicycle 、 The motorcycle , Human children can easily see , Commonalities between different cars . such as , All have wheels 、 Car handle . This ability to draw inferences from one instance , Artificial intelligence is still missing . So small sample study , It has always been the focus of academic research , The goal is to reduce the gap between artificial intelligence and human intelligence .

chart 3
2. Key scenarios for small sample learning , To reduce data collection 、 mark 、 Process and calculate costs . Now , Many developers will encounter huge amounts of data without labels , And it contains a lot of noise . This also makes me really want to use these data to mine some knowledge 、 Information , Is a very difficult thing .
Generally speaking , It is necessary to find the personnel of data crowdsourcing , Help you mark data . But the standard data , First, it takes a long time , Multiple iterations are required between the two sides . The quality of the final data , It will still contain , Some subjective factors of the target data person .
So if we can apply small sample learning , You can collect data 、 Marked cost , It is greatly reduced . Only a small data set needs to be collected , This data set only needs to contain a small number of 、 High quality labeling samples , You can train a model , To do regression prediction and classification .
3. Deal with some rare situations . For example, dangerous 、 Involving privacy 、 Ethical . A classic scene , New drug discovery . In the discovery of new drugs , Hope to be able to extract from millions of compounds , Find the compounds that match the desired properties . For example, it has low toxicity , It has high water solubility .
But new drug discovery , Itself is a very time-consuming process . It may take more than ten years , It will cost a lot , Go and recruit some subjects for the test . But in the end , Samples that can really enter the laboratory for testing , The quantity is very small . This led to the discovery of new drugs , It is a small sample learning problem .( Pictured 3)
Application fields of small sample learning
Due to small sample learning , It's really too common , So at present, each line 、 All trades 、 All fields , Small sample learning has emerged . The earliest one is CV, Computer vision , Such as picture classification 、 Object recognition 、 Picture cutting .
Later on NLP The field has also emerged , For example, I will do some classic relationship extraction 、NER These tasks . Recently, with the emergence of pre training model , Everyone will want to use the pre training model . Because these pre training models , They are usually trained in a large corpus , It contains rich semantic information and prior knowledge .
How to fine tune or build some templates , It can be adjusted to some new tasks , Even if it contains only a small amount of dimension data , This is also the latest NLP Research focus in this field .
except NLP field , There are also knowledge maps , For example, how to deal with the emerging new entities 、 A new relationship , This can be done through small sample learning .

chart 4
In addition, we have just mentioned , New drug discovery and robotics . for instance , Teach the robot dog to take two steps to the left , Or just show oneortwo gestures , It knows what I want to do , This is all about learning with small samples .
Definition and problems of small sample learning
The following gives a more rigorous definition of small sample learning , It's based on 1997 year Tom Mitchell Professor's classical machine learning definition .
What is machine learning ? For a certain type of task T, If a computer program , In this task T Upper and P Measured performance , With experience E Increased by , Just call the computer program , From experience E Learn from .
Small sample learning , It is a kind of machine learning . But something special is , The experience in it , There are only a few supervisory signals . Common supervisory signals , Is the label of the sample .

chart 5
The ideal of learning , The hope is to reduce the expected risk of the model . That is, no matter what kind of samples there are in the future , Can well predict . But the joint distribution of this model , Generally unknown , So we must estimate it .
In machine learning , Generally, it is to optimize the experience risk . however , You can see the empirical risk in the formula , It is determined by the number of samples in the training set . If it is a training sample , Only a small amount of annotation data . This I If the quantity is very small , You'll end up with very unreliable , An empirical valuation of minimized risk , It is really a difficult problem to make small samples learn .
however , This is not something that cannot be solved , The solution is that we will experience E Annotation information in , Combined with some prior knowledge . for instance , Just mentioned NLP Pre training model of domain , After combining these prior knowledge , Can make the task T Learning becomes feasible . There are generally three angles .
Through these prior knowledge , To generate more annotation samples , Used for training .
Through prior knowledge , Limit the spatial complexity of the model .
There can also be such a priori knowledge , Let it tell us , How to design an economical search strategy . for instance , In the hypothetical space, this big H above , Which point should I start searching ? Which way to search ? At what speed ? these , Will make the final search strategy , It can be more economical and effective . Only a few samples , You can get good results .
These methods , Are summarized and combed in detail in the summary of small sample learning . This is a ACM Computing Surveys The highest cited papers in the last two years , It's also ESI This year's highly cited papers .
PaddleFSL
Help you achieve small sample learning
I just introduced , General small sample learning method . Let's introduce , How to learn the toolkit through small samples PaddleFSL, To achieve small sample learning .

chart 6
PaddleFSL It's based on Flying propeller Small sample learning kit . In this kit , Provides simplicity 、 Easy to use 、 Stable classical small sample learning method , And support the development of new small sample learning methods .
Besides , It also provides unified data set processing , Make the model effect easier . It also provides very detailed notes , Allows you to easily customize new datasets . It already contains ,CV and NLP Classic application of small samples , And rely on Flying propeller The prosperity of Ecology , Continue to expand to new areas .
From here PaddleFSL On the overall frame diagram of , Can be seen in support of things like image classification 、 Relationship extraction 、 A series of tasks such as general natural language processing . And includes these three tasks , Some classical datasets involved .
To handle different applications , Different feature extractors are also available , For you to extract features .
such as CNN It is used to draw pictures , It also supports all PaddleNLP The pre training model provided inside . Besides , The model base also provides a classic small sample learning method . because PaddleFSL It's deployed in Flying propeller above , Therefore, it also supports cross platform deployment .
Here is the reproduction of the classification results of small sample pictures . Use PaddleFSL stay ProtoNet、RelationNet、MAML、ANIL These four methods are above , stay Omniglot、Mini-ImageNet Two classic datasets , Can be reproduced better than the article report , Or at least a comparable effect .
Here's a summary , Since joining Baidu Research Institute , Wangyaqing is mainly doing small sample learning . In terms of theoretical research , The article is now also ACM Computing Surveys, also WWW Employment . Besides , In the practical application of small samples , Especially new drug discovery , This year NeurIPS 2021 Received as Spotlight Paper. And small samples of short text classification articles , By EMNLP Received as long text . Intention recognition and cold start are also advancing , At present, it is in the draft review stage .
In addition, the work on small sample gesture recognition , It has obtained the general project support of the National Natural Science Foundation of China . Last but not least PaddleFSL, This package now has 1100 Much of the Star, as well as 1 Read more than ten thousand articles .
Take this opportunity to , I hope that students interested in small sample learning can scan the QR code below to learn more , And carry out cutting-edge research and practice together .

chart 7
Related to recommend
AI+ Supercalculation :AI Enable scientific and Engineering Computing Based on grid discretization
AI+CFD: A new method and paradigm of scientific machine learning for aerospace dynamics

Official account , Get more technical content ~
This article is shared in Blog “ Flying propeller PaddlePaddle”(CSDN).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- Hash problem
- Floyd
- Multipass中文文档-设置驱动
- Multipass Chinese documents - improve mount performance
- Happy New Year!
- Introduction to classification data cotegory and properties and methods of common APIs
- Numpy index and slice
- Multipass Chinese document - use multipass service to authorize the client
- 2022.2.17
- 【Latex】错误类型总结(持更)
猜你喜欢

【Latex】错误类型总结(持更)

Zuul 实现动态路由

Nabicat connection: local MySQL & cloud service MySQL and error reporting

Rsync common error messages (common errors on the window)

#微信小程序# 在小程序里面退出退出小程序(navigator以及API--wx.exitMiniProgram)

Why do many Shopify independent station sellers use chat robots? Read industry secrets in one minute!

Pycharm package import error without warning

Statsmodels Library -- linear regression model

ROS 笔记(07)— 客户端 Client 和服务端 Server 的实现

Tensorflow and deep learning day 3
随机推荐
[quartz] read configuration from database to realize dynamic timing task
YOLOV5超参数设置与数据增强解析
Codeforces Round #802 (Div. 2)(A-D)
2022.2.11
NVM installation and use and NPM package installation failure record
广和通联合安提国际为基于英伟达 Jetson Xavier NX的AI边缘计算平台带来5G R16强大性能
Tensorflow and deep learning day 3
Comment enregistrer une image dans une applet Wechat
Genius makers: lone Rangers, technology giants and AI | ten years of the rise of in-depth learning
How MySQL deletes all redundant duplicate data
Image translation /gan:unsupervised image-to-image translation with self attention networks
date_ Range creation date range freq parameter value table and creation example
2022.2.17
Wechat applet exits the applet (navigator and api--wx.exitminiprogram)
Day3 data type and Operator jobs
Some parameter settings and feature graph visualization of yolov5-6.0
Multipass Chinese document - remove instance
Collections and dictionaries
Ai+ remote sensing: releasing the value of each pixel
Machine learning final exercises