当前位置:网站首页>【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
2022-07-16 06:25:00 【AI frontier theory group @ouc】

The paper :https://openreview.net/forum?id=_WnAQKse_uK
Code :https://github.com/Annbless/ViTAE
1、Motivation
The idea of this paper is very simple : take CNN and VIT combination , For shallow layer CNN, Deep use VIT. meanwhile , stay attention Branch add a convolution Branch .
2、Method
The overall network architecture is shown in the figure below , Consists of three Reduction Cell (RC) And some Normal Cell(NC).

RC modular
and VIT Of Transformer block comparison ,RC One more. pyramid reduction , It is multi-scale hole convolution parallel , Finally spliced into a . meanwhile , stay shortcut in , More 3 A convolution . Last , still more seq2img Turn into feature map.
NC modular
and VIT Of transformer block The difference is calculation attention There is one more convolution Branch .
3、 Interesting places
from openreview According to my opinion , Approved by the reviewer strong points:
- The idea of injecting multi-scale features is interesting and promising.
- The paper is well written and easy to follow.
meanwhile , There are also some weak links in the paper :
- The paper use an additional conv branch together with the self-attention branch to construct the new network architecture, it is obvious that the extra conv layers will help to improve the performance of the network. The proposed network modification looks a little bit incremental and not very interesting to me.
- There are no results on the downstream object detection and segmentation tasks, since this paper aims to introduce the inductive bias on the visual structure.
- The proposed method is mainly verified on small input images. Thus, I am a little bit concerned about its memory consumption and running speed when applied on large images (as segmentation or detection typically uses large image resolutions).
边栏推荐
- FreeRTOS的启动流程,编码风格与调试方法
- HDU 3666 THE MATRIX PROBLEM (差分约束+栈优化spfa判负环)
- 关于物联网毕设须知
- Practice of recording, uploading and playing audio wechat applet
- How to export wechat chat records
- 如何将会员消费能力分类?
- 简单线程实例-跑马灯-栈空间分配技巧
- 二叉树的各种操作(叶子节点、双亲节点、搜索二叉树、删除二叉树中的节点、二叉树的深度)
- c语言 字符串的系列操作(字符串的逆序输出、字符串类型与int、double的互相转换)
- 在线SQL转XML工具
猜你喜欢
![[paper notes] - low illumination image enhancement - zeroshot - retinaxdip Network - 2021-tcsvt](/img/a1/a494121ed668c3a154f440555522a4.png)
[paper notes] - low illumination image enhancement - zeroshot - retinaxdip Network - 2021-tcsvt

【论文笔记】—AlexNet—2012-ACM

FreeRTOS的启动流程,编码风格与调试方法

KEIL中文乱码解决方法

第四章 STM32+LD3320+SYN6288+DHT11实现语音获取温湿度数值(下)

RT_thread信号量的使用

FreeRTOS 学习(一)

【MATLAB】matlab第二课——绘图初步

Target detection (1) -- data preprocessing and data set segmentation

【论文笔记】—低照度图像增强—ZeroShot—RetinexDIP网络—2021-TCSVT
随机推荐
【MATLAB】matlab第二课——绘图初步
HDU 1435 Stable Match (稳定婚姻匹配)
管道(Pipe)/createPipe
表格图像提取-基于传统交点方法和Tesseract-OCR
HDU 2586 How far away ? (lca倍增法)
【論文筆記】—VGG網絡—2014-ICLR
About coursera
【MATLAB】matlab第三课——绘图进阶
Custom loading animation
【代码笔记】RRDNet 网络
C语言宏定义(宏参数创建字符串、预处理粘合剂)
HDU 1530 Maximum Clique (最大团)
Customize breadcrumb navigation
swiper使用技巧(一)
RT_thread信号量的使用
[paper notes] - dark video enhancement supervised stablllve network 2021-cvpr
Redux source code analysis
论文阅读笔记——Crop yield prediction using deep neural networks
HDU 2874 Connections between cities (并查集+lca倍增法)
DCGAN:DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS——论文分析