Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boyu Deng

EIT: Efficiently Lead Inductive Biases to ViT

Mar 14, 2022

Rui Xia, Jingchao Wang, Chao Xue, Boyu Deng, Fang Wang

Figure 1 for EIT: Efficiently Lead Inductive Biases to ViT

Figure 2 for EIT: Efficiently Lead Inductive Biases to ViT

Figure 3 for EIT: Efficiently Lead Inductive Biases to ViT

Figure 4 for EIT: Efficiently Lead Inductive Biases to ViT

Abstract:Vision Transformer (ViT) depends on properties similar to the inductive bias inherent in Convolutional Neural Networks to perform better on non-ultra-large scale datasets. In this paper, we propose an architecture called Efficiently lead Inductive biases to ViT (EIT), which can effectively lead the inductive biases to both phases of ViT. In the Patches Projection phase, a convolutional max-pooling structure is used to produce overlapping patches. In the Transformer Encoder phase, we design a novel inductive bias introduction structure called decreasing convolution, which is introduced parallel to the multi-headed attention module, by which the embedding's different channels are processed respectively. In four popular small-scale datasets, compared with ViT, EIT has an accuracy improvement of 12.6% on average with fewer parameters and FLOPs. Compared with ResNet, EIT exhibits higher accuracy with only 17.7% parameters and fewer FLOPs. Finally, ablation studies show that the EIT is efficient and does not require position embedding. Code is coming soon: https://github.com/MrHaiPi/EIT

* 12 pages, 7 figures

Via

Access Paper or Ask Questions