当前位置:网站首页>[mae]masked autoencoders mask self encoder
[mae]masked autoencoders mask self encoder
2022-06-23 17:34:00 【luemeon】
Catalog
Asymmetric coding - Decoding architecture :
Randomly mask the sub blocks of the input picture , Rebuild lost pixels
from MAE The pre - training model has good generalization performance
Asymmetric coding - Decoding architecture :
The input of the encoder is not mask Sub block of ;
The decoder is lightweight
( The decoder only works in the pre training of image reconstruction , Therefore, the decoder design can be independent of the encoder , And flexible and lightweight ),
The input is the input of the encoder and is mask Partial location information ,
The output is the value of the missing pixel to be reconstructed .
Method
It differs from the classical self encoder in that :
Asymmetric design ,
Make the encoder depend only on part Observation information ( No mask required token Information ,BERT need ),
The lightweight decoder is directly connected to The resulting implicit expression And Mask token Reconstruct the original signal

technological process
- Cut the picture patch, Random Pick a few ( For example, in the text 25%) As network input ;
- Input through encoder Get the corresponding encoded encoded patches
- take encoded patches Restore to the corresponding original location , And in Fill in the missing part masked patches
- Send in decoder, Every decoder The forecast corresponds to patch Image pixels ;
- Calculate the distance between the predicted pixel and the pixel of the original picture MSE As loss.( The loss function uses MSE, notes : Be similar to BERT The loss is calculated only in the mask block .)
- Take the training model of encoder As part of downstream tasks basemodel And under the downstream task finetune.

Encoder
use ViT framework , But it only works on the visible and not Mask The block .
Through the first Linear Projection code picture ,
Plus Location code ,
Then sent A pile of continuous Transformer Block Inside .
The output of the encoder will pass through reshape Build the reconstructed image ,
Reconstruction target The MAE The original information is reconstructed by predicting the pixel value of each mask block .
Because the encoding and decoding is only in small blocks ( such as 25%) To deal with , And no mask is used Token Information .
This makes Can train a very large encoder .
decoder
The input contains : The whole picture patches aggregate
(1) Output of encoder ;
(2) Mask token.
Every mask tokens It's all one Shared 、 Learning vectors , It indicates that there is a to be predicted tokens
take Position insertion Add to this complete image patch In the collection all tokens in
The decoder also contains a series of Transformer modular .
The last layer of the decoder is the linear projection layer (Linear Projection), The number of output channels is equal to Each piece of Pixels Number .
MAE decoder Only in Pre training stage be used for Image reconstruction , Encoder Is used to generate for distinguish Image representation of .
Decoder design Independent In coding design , With a high degree of flexibility
The reconstruction target is the normalized pixel value of each mask block .
Calculate the mean and standard deviation of each block , Normalize the block , Normalized pixels are used as reconstruction targets to improve the expression ability .
Simple implementation MAE Pre training is extremely efficient :
1. adopt Linear projection Linear Projection And Location code For each input block Generate token;
2. Random substitution (random shuffle)token Sequence And according to Mask scale masking ratio remove Last part token;
3. After coding , hold unmasked patches Output to Encoder in , Get these tokens The representation of .
Insert mask in encoding block token Parallel inverse permutation (unshuffle) Get the whole sequence token To facilitate target Align ;
4. hold Encoder Output , combination masked tokens ( Learnable vectors ), perform unshuffle Operation recovery sequence , And then input them to Decoder in . The decoder is applied to the whole sequence token.
As mentioned above :MAE No sparse operation is required . Besides ,shuffle And unshuffle It's very fast , The amount of computation introduced can be ignored .
Classified as ImageNet data , Detect yes COCO data , Segmentation has ADE data

Partial Fine-tuning
Put forward a kind of Partial Fine-tuning The new routine of , It is different from what people used to Linear Probing ( Only the parameters of the last linear classifier are trained ) and Fine-tuning ( Training parameters of all layers ).
Partial Fine-tuning Refer to Only the parameters of several layers of the final model are trained
Reference link :
边栏推荐
- Talk about the difference between redis cache penetration and cache breakdown, and the avalanche effect caused by them
- Digital twin excavator of Tupu software realizes remote control
- [go]沙盒环境下调用支付宝扫码支付
- 图扑数字孪生 3D 风电场,智慧风电之海上风电
- 图扑软件数字孪生挖掘机实现远程操控
- C # connection to database
- Apache foundation officially announced Apache inlong as a top-level project
- Ctfshow PHP features
- AMQP protocol
- B. AND 0, Sum Big-Codeforces Round #716 (Div. 2)
猜你喜欢

Huawei mobile phones install APK through ADB and prompt "the signature is inconsistent. The application may have been modified."
![[go] calling Alipay to scan code for payment in a sandbox environment](/img/d4/c6d72a697bc08f69f11121a15109b3.png)
[go] calling Alipay to scan code for payment in a sandbox environment

What can the accelerated implementation of digital economy bring to SMEs?

How important is 5g dual card dual access?

Network remote access raspberry pie (VNC viewer)

时间戳90K是什么意思?

【30. 串联所有单词的子串】

酒店入住时间和离店时间的日期选择

QT当中的【QSetting和.ini配置文件】以及【创建Resources.qrc】

How to use SQL window functions
随机推荐
Li Kou daily question - day 25 -495 Timo attack
手机开户股票开户需要多久?在线开户安全么?
How can the points mall make profits
QT布局管理器【QVBoxLayout,QHBoxLayout,QGridLayout】
Performance test bottleneck tuning in 10 minutes! If you want to enter a large factory, you must know
以 27K 成功入职字节跳动,这份《 软件测试面试笔记》让我受益终身
What does websocket do?
10分钟后性能测试瓶颈调优!想进大厂这个必须会
C#与数据库连接
How to configure MySQL log management
Rongyun: let the bank go to the "cloud" easily
Three minutes to learn how to retrieve the MySQL password
EasyPlayer移动端播放webrtc协议时长按播放页面无法关闭“关于我们”页面
使用Jmeter进行性能测试及性能监控平台搭建
电感参数有哪些?怎么选择电感?
浅析3种电池容量监测方案
Jetpack compose and material you FAQs
qYKVEtqdDg
Codeforces Round #620 (Div. 2)ABC
[30. concatenate substrings of all words]