当前位置：网站首页>Active learning

Active learning

2022-06-27 06:07:00 【Mango is very bright~】

background

In real application scenarios , Training an effective depth model depends on a large number of labeled samples , However, accurate annotation of large-scale data is often time-consuming and expensive . by Reduce model dependence on data , Unsupervised learning has been put forward one after another , Semi supervised learning and weak supervised learning . In these methods , Active learning is one of the main ways to reduce the cost of sample labeling .

Active learning ： adopt iteration The way , Select the most valuable sample Add training after marking , Improve the performance of the model with the least annotation cost .

The sampling strategy

Design strategies so that the queried information is helpful to improve the target model . At present, many active learning methods have been proposed ： Some methods are Select the sample with the largest amount of information The query . The amount of information can be measured by different standards , Such as uncertainty 、 Generalization error reduction , But such methods Only the sample requirements of the model are considered , It may lead to the difference between the selected sample distribution and the real distribution of the data set . Other methods are Query tags for representative samples , Its Representativeness can be based on Clustering structure perhaps density To estimate , But such methods Select the sample that can best represent the sample distribution , It ignores the information of the model itself on the sample classification performance .

No indeed Qualitative sampling strategy , Including the most Low confidence sampling 、 Edge sampling and entropy sampling . The core idea ： It estimates the uncertainty of the sample by using the model's predicted posterior probability . The prediction probability of the model on unlabeled samples is more balanced （ The probability that the sample belongs to multiple categories is almost the same ）, The more difficult it is to judge the category of the sample , Adding it to the training will effectively improve the classification performance of the model （ When classifying data , The closer the sample is to the classification surface , The more uncertain , The more information it contains ） Take the task of two categories as an example , The entropy sampling strategy usually chooses a posteriori probability which is the closest 0.5 The sample of .

The combination of information and representativeness can be divided into three categories ：

Serial connection mode , Use each pick strategy in turn to filter “ Low value ” sample . The common practice is to select the most informative batch of samples from the unmarked sample set
Ben , Then we use the clustering algorithm to cluster this batch of samples , The obtained cluster center is the sample to be queried .
Probability selection method , In each active learning iteration , The sampling strategy used in the current iteration is determined according to the probability parameters .
Parallel combination , The most popular way to combine active learning strategies , The mixed score is calculated by using the weighted sum or multi-objective optimization method of different sampling strategy standards , Sort the unlabeled samples according to the scores , Select the sample with the highest score .

Instability sampling

The above method attempts to estimate the potential value of the sample to the improved model , But only the current model is used to evaluate the unlabeled samples , The information about the prediction stability of unlabeled samples contained in the historical model is ignored . In different iteration cycles , The recognition effect of the target model on the same sample is variable （ The recognition ability of the classification model to the sample Instability ）. Quantify change information , Select the sample with the most unstable model recognition effect , Selecting such samples for annotation and training can provide more effective information to improve the generalization performance of the target model . Ignoring the potential value of historical models will lead to the fact that active learning strategies do not necessarily select the most valuable samples . therefore , During active sampling , In addition to considering the prediction of the current model for unmarked samples , The differences predicted by previous models should also be considered .

No steady Active learning method of qualitative sampling , According to the prediction difference of unlabeled samples in the whole learning process To measure the potential utility of unlabeled samples in improving model performance . Calculate the most recent N The difference of posterior probabilities between the two models for unlabeled samples Measure its instability , And select the most unstable sample for marking .

Definition of instability index

Problem setting and learning framework （ flow chart ）

except The first round of active learning iteration , Use random sampling to select samples , Each subsequent iteration , All use the one closest to the current round N A historical classification model { Mt － 1,Mt － 2,…,Mt － N} For each unmarked sample xj To make predictions , obtain N A posterior probability . Then instability sampling is used to estimate the instability of each unlabeled sample , And select the most unstable sample for marking .

Unstable sampling strategy （ Method ）

Active learning t iteration in , Previous models , namely { Mt － 1,Mt － 2,…,Mt － N } For unmarked samples xj The prediction results are unstable , This indicates that the target model has insufficient recognition ability for the sample . The more unstable the forecast is , The more difficult it is for the sample to be effectively identified . thus , The most unstable samples should be selected for query as far as possible .

use Measurement model Calculation Recognition ability ： The uncertainty of model prediction is measured by information entropy , The more difficult it is for the model to judge the category of the sample , The lower the recognition ability .
use Difference in posterior probability distribution To measure The degree to which the model recognition ability becomes weak ： There are ways to measure distribution differences KL The divergence 、JS Divergence and asserstein distance .

An active learning method based on unstable sampling experiment

Experimental setup

by The proposed method is further verified in Traditional model and depth model On the validity of , Use different base classification models , Logistic returns ( logistic regression) Model 、LeNet-5 and ResNet18. All the experiments used an untrained initialization model . Divide the data set into 70% Training samples and 30% Test samples of .

For traditional models ： Training set random sampling 5% To initialize the labeled sample set , In each active learning iteration , Select by sampling strategy b = 1 Unlabeled samples are annotated and added to the annotation set , The total marked budget is 200.

For depth models ： The initial labeled training samples account for 0.5%, The total marked budget is 500;svhn With the exception of , Its random sampling 1% To initialize the dimensioned set , The total marked budget is 2 000. The depth model is selected in each active learning iteration b = 10 Samples are marked . Different annotation budgets are based on the final convergence of the experiment Depending on the situation , So as to observe the performance of the active learning sampling strategy .

The initial learning rate of the experiment is set as 0.01, Batch size is 64, mnist and fashionmnist Data sets ： Every time 50 The learning rate of this iteration is updated to the original 10% ,svhn Data sets ： Every time 20 The learning rate of this iteration is updated to the original 90% . Repeat 5 Experiments , Calculate the average accuracy of the target model in each active learning iteration , And draw the change curve of average accuracy with the number of query samples , The faster the curve rises , This indicates that the higher the performance of the sampling strategy .

Experimental results and Analysis

The unstable sampling method achieves the best performance in most cases
Unstable sampling is significantly better than random sampling in almost all cases
Instability sampling In almost all cases it is associated with Uncertainty sampling Method ( Minimum confidence sampling and maximum entropy sampling ) Perform equal to or better than them

To make a long story short , Unstable sampling can effectively select the most useful samples for the model , Improve active learning performance ; At the same time, it shows that the potential utility of considering the prediction instability of the historical model is greater than that of selecting samples only based on the current model .

The impact of the number of models

Further study the number of historical models N Influence on the experimental results . Set separately N = 2,3,5, Show the performance curve . Found that when N = 5 when , Performance ratio of unstable sampling method N = 2 and N = 3 Poor performance . The reason may be In this paper, we use the previous one that is closest to the current active learning round N Experiment with a historical model , As the number of active learning iterations increases , The performance of the model obtained from the previous rounds of training is weak , The accuracy of posterior probability predicted by these models is low , The calculated data are highly unstable , The filtered data may not be “ Expected high quality data ”, Eventually, with N increase , The effect of the method decreases .