当前位置:网站首页>[paper reading] mean teachers are better role models
[paper reading] mean teachers are better role models
2022-07-24 13:08:00 【The next day is expected 1314】
1. Abstract
Recently proposed Temporal Ensembling The most advanced results have been achieved in several semi supervised learning benchmarks . It maintains tag predictions for each training example EMA, And punish predictions that are inconsistent with this goal . However , Because every goal epoch Only change once , So when learning large data sets ,Temporal Ensembling Become clumsy . To overcome this problem , We proposed Mean Teacher, This is a kind of Average model weight Rather than label prediction . As an additional benefit ,Mean Teacher Improve the accuracy of the test , And you can use ratio Temporal Ensembling Less tags for training .
2. Pre knowledge
While reading the abstract of the paper , At the same time, it is also accompanied by ignorance , The reason is simply the lack of pre knowledge in this field . This section mainly introduces the pre knowledge I supplemented in the process of understanding the paper .
2.1. Temporal Ensembling
We can translate it into Time integration , The first sentence in the abstract is a tribute to the discovery of our predecessors , So this is state of art Proposed . Want to know the knowledge points in this paper , You can finish reading the previous Blog .
2.2. EMA
EMA(exponential moving average), Also called exponential moving average , It is a type of average commonly used in time series analysis . Simply speaking ,EMA Is a weighted average . among , An important feature of it is that with the passage of time , Old observations will show exponential decay . equation 1, It means EMA The recurrence formula of , Details can be found in Blog .
S t = { S 0 , t = 1 ( 1 − α ) S t − 1 + α X t , t ≥ 2 (1) S_t = \begin{cases} S_0,& t=1 \\ (1-\alpha)S_{t-1}+\alpha X_t,& t \geq2 \\ \end{cases}\tag{1} St={ S0,(1−α)St−1+αXt,t=1t≥2(1)
3. Algorithm description

The algorithm innovation proposed in this paper is based on the previous article The paper Of , The main change is the cost of consistency , This can be understood as unsupervised loss . The algorithm in this paper is essentially to maintain two models ,Teacher and Student, The result of two norm operation with the same input through the output of two models is regarded as unsupervised loss . Π \Pi Π model In essence, only one model is maintained , Just there will be Dropout. T e m p o r a l E n s e m b l i n g Temporal Ensembling TemporalEnsembling Only one model is maintained , Just put the model in each epoch The output of EMA. This article is more direct , Maintain two models directly , Make the parameters of the two models EMA, Macroscopically, it can be seen as a model that imparts its own experience to another model , It is described in the paper as Mean Teacher .
边栏推荐
猜你喜欢
随机推荐
34. Add two numbers
Atcoder beginer contest 261e / / bitwise thinking + DP
MobileViT:挑战MobileNet端侧霸主
26. Reverse linked list II
July training (day 24) - segment tree
20201127 use markdown to draw UML diagrams, graphviz installation experience hematemesis finishing
国产旗舰手机定价近六千,却连iPhone12都打不过,用户选谁很明确
Constraintlayout learn from 0 to 0.n
Custom scroll bar
Why does 2.tostring() report an error
登临科技联合创始人王平:创新+自研“双核”驱动,GPU+赋能AI落地生根|量子位·视点分享回顾...
Summary of recent interviews
25. Middle order traversal of binary tree
It is difficult for Chinese consumers and industrial chains to leave apple, and iPhone has too much influence
Nearly 65billion pieces of personal information were illegally handled in seven years, and the investigation of didi network security review case was announced
FinClip 「小程序导出 App 」功能又双叒叕更新了
如何画 贝赛尔曲线 以及 样条曲线?
Cluster construction based on kubernetes v1.24.0 (III)
Video realizes the control of video progress, playback and pause
Efficientformer: lightweight vit backbone









