Alert button

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Feb 05, 2021
Wonjae Kim, Bokyung Son, Ildoo Kim

Figure 1 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 2 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 3 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 4 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: