当前位置:网站首页>【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
2022-07-13 17:48:00 【AI前沿理论组@OUC】

论文:https://openreview.net/forum?id=_WnAQKse_uK
代码:https://github.com/Annbless/ViTAE
1、Motivation
这个论文的思想非常简单:将CNN和 VIT 结合,浅层用CNN,深层用VIT。 同时,在attention 分支添加一个卷积层分支。
2、Method
网络整体架构如下图所示,包括三个 Reduction Cell (RC) 和若干 Normal Cell(NC)。

RC 模块
和 VIT 的 Transformer block 相比,RC多了一个 pyramid reduction ,就是多尺度空洞卷积并行,最终拼接成一个。同时,在 shortcut 里,多了3个卷积。最后,还要 seq2img 转成 feature map。
NC 模块
和VIT的 transformer block 有区别的地方就是计算 attention 那里多了一个卷积分支。
3、有趣的地方
从openreview的意见来看,审稿人认可的 strong points:
- The idea of injecting multi-scale features is interesting and promising.
- The paper is well written and easy to follow.
同时,论文也存在一些薄弱环节:
- The paper use an additional conv branch together with the self-attention branch to construct the new network architecture, it is obvious that the extra conv layers will help to improve the performance of the network. The proposed network modification looks a little bit incremental and not very interesting to me.
- There are no results on the downstream object detection and segmentation tasks, since this paper aims to introduce the inductive bias on the visual structure.
- The proposed method is mainly verified on small input images. Thus, I am a little bit concerned about its memory consumption and running speed when applied on large images (as segmentation or detection typically uses large image resolutions).
边栏推荐
- Notes on network communication security -- static routing and experiment
- MySQL multi table query joint query / sub query
- Reflection get member methods and member variables
- MSF利用永恒之蓝渗透win2003
- Mysql 主从服务器配置实验 centos7
- 读SDWebImage源码笔记
- PHP开发之简单上传功能的实现
- 【ARXIV2205】Inception Transformer
- 关于 Visual Studio 2022的安装与使用
- JS -- built in function of data system
猜你喜欢

网络通信安全部分笔记二

ES6 -- symbol() and map()

The technology once selected in the top meeting completed the commercialization of ant chain and launched the copyright AI computing engine

40.js -- the same name identifier promotion problem
利用Spark预测回头客实验报告

ES6 -- let and Const

ES6 -- object

NFTScan 开发者平台推出多链 NFT 数据 Pro API 服务

JS simple fast scheduling implementation

Arthas introduction and idea plug-in quick start
随机推荐
40.js -- the same name identifier promotion problem
使用base64对图片进行编码、对byte[]进行编码
录音、上传、播放音频微信小程序实践
Web API——执行事件的步骤 & 操作元素
Compilation principle - parser design
【MIT Missing Semester 2】Shell Tools
《代码整洁之道》读后笔记
ES6 -- arrow function
tkMapper之使用Weekend拼接条件进行条件查询
MSF利用永恒之蓝渗透win2003
Unity experiment - simulating the motion of stars in the solar system
WKWebView之离线加载以及遇到的问题
It's 5 days late to convert the string to time. Pit avoidance Guide
JS time object
网络通信安全部分笔记二
chrome浏览器91版本SameSite by default cookies被移除后的解决方案,Chrome中跨域POST请求无法携带Cookie的解决方案
sniffer Pro对ARP协议的分析、捕获与模拟攻击
Graphic and image programming practice course report
About the installation and use of visual studio 2022
【ARXIV2205】EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers