当前位置:网站首页>Active learning
Active learning
2022-06-27 06:07:00 【Mango is very bright~】
background
In real application scenarios , Training an effective depth model depends on a large number of labeled samples , However, accurate annotation of large-scale data is often time-consuming and expensive . by Reduce model dependence on data , Unsupervised learning has been put forward one after another , Semi supervised learning and weak supervised learning . In these methods , Active learning is one of the main ways to reduce the cost of sample labeling .
Active learning : adopt iteration The way , Select the most valuable sample Add training after marking , Improve the performance of the model with the least annotation cost .
The sampling strategy
Design strategies so that the queried information is helpful to improve the target model . At present, many active learning methods have been proposed : Some methods are Select the sample with the largest amount of information The query . The amount of information can be measured by different standards , Such as uncertainty 、 Generalization error reduction , But such methods Only the sample requirements of the model are considered , It may lead to the difference between the selected sample distribution and the real distribution of the data set . Other methods are Query tags for representative samples , Its Representativeness can be based on Clustering structure perhaps density To estimate , But such methods Select the sample that can best represent the sample distribution , It ignores the information of the model itself on the sample classification performance .
No indeed Qualitative sampling strategy , Including the most Low confidence sampling 、 Edge sampling and entropy sampling . The core idea : It estimates the uncertainty of the sample by using the model's predicted posterior probability . The prediction probability of the model on unlabeled samples is more balanced ( The probability that the sample belongs to multiple categories is almost the same ), The more difficult it is to judge the category of the sample , Adding it to the training will effectively improve the classification performance of the model ( When classifying data , The closer the sample is to the classification surface , The more uncertain , The more information it contains ) Take the task of two categories as an example , The entropy sampling strategy usually chooses a posteriori probability which is the closest 0.5 The sample of .
The combination of information and representativeness can be divided into three categories :
- Serial connection mode , Use each pick strategy in turn to filter “ Low value ” sample . The common practice is to select the most informative batch of samples from the unmarked sample set
Ben , Then we use the clustering algorithm to cluster this batch of samples , The obtained cluster center is the sample to be queried .
Probability selection method , In each active learning iteration , The sampling strategy used in the current iteration is determined according to the probability parameters .
- Parallel combination , The most popular way to combine active learning strategies , The mixed score is calculated by using the weighted sum or multi-objective optimization method of different sampling strategy standards , Sort the unlabeled samples according to the scores , Select the sample with the highest score .
Instability sampling
The above method attempts to estimate the potential value of the sample to the improved model , But only the current model is used to evaluate the unlabeled samples , The information about the prediction stability of unlabeled samples contained in the historical model is ignored . In different iteration cycles , The recognition effect of the target model on the same sample is variable ( The recognition ability of the classification model to the sample Instability ). Quantify change information , Select the sample with the most unstable model recognition effect , Selecting such samples for annotation and training can provide more effective information to improve the generalization performance of the target model . Ignoring the potential value of historical models will lead to the fact that active learning strategies do not necessarily select the most valuable samples . therefore , During active sampling , In addition to considering the prediction of the current model for unmarked samples , The differences predicted by previous models should also be considered .
No steady Active learning method of qualitative sampling , According to the prediction difference of unlabeled samples in the whole learning process To measure the potential utility of unlabeled samples in improving model performance . Calculate the most recent N The difference of posterior probabilities between the two models for unlabeled samples Measure its instability , And select the most unstable sample for marking .

Definition of instability index
Problem setting and learning framework ( flow chart )

except The first round of active learning iteration , Use random sampling to select samples , Each subsequent iteration , All use the one closest to the current round N A historical classification model { Mt - 1,Mt - 2,…,Mt - N} For each unmarked sample xj To make predictions , obtain N A posterior probability . Then instability sampling is used to estimate the instability of each unlabeled sample , And select the most unstable sample for marking .
Unstable sampling strategy ( Method )
Active learning t iteration in , Previous models , namely { Mt - 1,Mt - 2,…,Mt - N } For unmarked samples xj The prediction results are unstable , This indicates that the target model has insufficient recognition ability for the sample . The more unstable the forecast is , The more difficult it is for the sample to be effectively identified . thus , The most unstable samples should be selected for query as far as possible .
- use Measurement model Calculation Recognition ability : The uncertainty of model prediction is measured by information entropy , The more difficult it is for the model to judge the category of the sample , The lower the recognition ability .
- use Difference in posterior probability distribution To measure The degree to which the model recognition ability becomes weak : There are ways to measure distribution differences KL The divergence 、JS Divergence and asserstein distance .

An active learning method based on unstable sampling experiment
Experimental setup

by The proposed method is further verified in Traditional model and depth model On the validity of , Use different base classification models , Logistic returns ( logistic regression) Model 、LeNet-5 and ResNet18. All the experiments used an untrained initialization model . Divide the data set into 70% Training samples and 30% Test samples of .
For traditional models : Training set random sampling 5% To initialize the labeled sample set , In each active learning iteration , Select by sampling strategy b = 1 Unlabeled samples are annotated and added to the annotation set , The total marked budget is 200.
For depth models : The initial labeled training samples account for 0.5%, The total marked budget is 500;svhn With the exception of , Its random sampling 1% To initialize the dimensioned set , The total marked budget is 2 000. The depth model is selected in each active learning iteration b = 10 Samples are marked . Different annotation budgets are based on the final convergence of the experiment Depending on the situation , So as to observe the performance of the active learning sampling strategy .
The initial learning rate of the experiment is set as 0.01, Batch size is 64, mnist and fashionmnist Data sets : Every time 50 The learning rate of this iteration is updated to the original 10% ,svhn Data sets : Every time 20 The learning rate of this iteration is updated to the original 90% . Repeat 5 Experiments , Calculate the average accuracy of the target model in each active learning iteration , And draw the change curve of average accuracy with the number of query samples , The faster the curve rises , This indicates that the higher the performance of the sampling strategy .
Experimental results and Analysis

- The unstable sampling method achieves the best performance in most cases
- Unstable sampling is significantly better than random sampling in almost all cases
- Instability sampling In almost all cases it is associated with Uncertainty sampling Method ( Minimum confidence sampling and maximum entropy sampling ) Perform equal to or better than them
To make a long story short , Unstable sampling can effectively select the most useful samples for the model , Improve active learning performance ; At the same time, it shows that the potential utility of considering the prediction instability of the historical model is greater than that of selecting samples only based on the current model .
The impact of the number of models
Further study the number of historical models N Influence on the experimental results . Set separately N = 2,3,5, Show the performance curve . Found that when N = 5 when , Performance ratio of unstable sampling method N = 2 and N = 3 Poor performance . The reason may be In this paper, we use the previous one that is closest to the current active learning round N Experiment with a historical model , As the number of active learning iterations increases , The performance of the model obtained from the previous rounds of training is weak , The accuracy of posterior probability predicted by these models is low , The calculated data are highly unstable , The filtered data may not be “ Expected high quality data ”, Eventually, with N increase , The effect of the method decreases .

reference : An active learning method based on unstable sampling
边栏推荐
- 函数栈帧的形成与释放
- Wechat applet refreshes the current page
- 软件测试年终总结报告模板
- 汇编语言-王爽 第13章 int指令-笔记
- Altium Designer 19 器件丝印标号位置批量统一摆放
- NEON优化1:软件性能优化、降功耗怎么搞?
- 30个单片机常见问题及解决办法!
- Jump details of item -h5 list, and realize the function of not refreshing when backing up, and refreshing when modifying data (record scroll bar)
- Information System Project Manager - Chapter VII project cost management
- 免费的 SSH 和 Telnet 客户端PuTTY
猜你喜欢

The restart status of the openstack instance will change to the error handling method. The openstack built by the container restarts the compute service method of the computing node and prompts the gi

LeetCode 0086.分隔链表

Dev++ environment setting C language keyword display color

NLP-D62-nlp比赛D31&刷题D15

JVM的垃圾回收机制

多线程基础部分Part3

JVM类加载机制

426-二叉树(513.找树左下角的值、112. 路径总和、106.从中序与后序遍历序列构造二叉树、654. 最大二叉树)

JVM overall structure analysis

G1 and ZGC garbage collector
随机推荐
The restart status of the openstack instance will change to the error handling method. The openstack built by the container restarts the compute service method of the computing node and prompts the gi
C语言练手小项目(巩固加深知识点理解)
【Cocos Creator 3.5.1】input.on的使用
我对于测试团队建设的意见
WebRTC系列-网络传输之7-ICE补充之提名(nomination)与ICE_Model
How to check the frequency of memory and the number of memory slots in CPU-Z?
Win 10 如何打开环境变量窗口
多线程基础部分Part3
Implementation of easyexcel's function of merging cells with the same content and dynamic title
JVM tuning ideas
Using domain name forwarding mqtt protocol, pit avoidance Guide
yaml文件加密
Multithreading basic part2
【QT小记】QT元对象系统简单认识
NLP-D62-nlp比赛D31&刷题D15
The SCP command is used in the expect script. The perfect solution to the problem that the SCP command in the expect script cannot obtain the value
Download CUDA and cudnn
多线程基础部分Part2
Configuring the help class iconfiguration in C # NETCORE
The form verifies the variables bound to the V-model, and the solution to invalid verification

