Alert button
Picture for Alexey Dosovitskiy

Alexey Dosovitskiy

Alert button

Simple Open-Vocabulary Object Detection with Vision Transformers

Add code
Bookmark button
Alert button
May 12, 2022
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby

Figure 1 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Simple Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

Add code
Bookmark button
Alert button
Nov 29, 2021
Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi

Figure 1 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 2 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 3 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 4 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Viaarxiv icon

Conditional Object-Centric Learning from Video

Add code
Bookmark button
Alert button
Nov 24, 2021
Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff

Figure 1 for Conditional Object-Centric Learning from Video
Figure 2 for Conditional Object-Centric Learning from Video
Figure 3 for Conditional Object-Centric Learning from Video
Figure 4 for Conditional Object-Centric Learning from Video
Viaarxiv icon

Do Vision Transformers See Like Convolutional Neural Networks?

Add code
Bookmark button
Alert button
Aug 19, 2021
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy

Figure 1 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 2 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 3 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 4 for Do Vision Transformers See Like Convolutional Neural Networks?
Viaarxiv icon

MLP-Mixer: An all-MLP Architecture for Vision

Add code
Bookmark button
Alert button
May 17, 2021
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Figure 1 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 2 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 3 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 4 for MLP-Mixer: An all-MLP Architecture for Vision
Viaarxiv icon

Differentiable Patch Selection for Image Recognition

Add code
Bookmark button
Alert button
Apr 07, 2021
Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner

Figure 1 for Differentiable Patch Selection for Image Recognition
Figure 2 for Differentiable Patch Selection for Image Recognition
Figure 3 for Differentiable Patch Selection for Image Recognition
Figure 4 for Differentiable Patch Selection for Image Recognition
Viaarxiv icon

Learning Object-Centric Video Models by Contrasting Sets

Add code
Bookmark button
Alert button
Nov 20, 2020
Sindy Löwe, Klaus Greff, Rico Jonschkowski, Alexey Dosovitskiy, Thomas Kipf

Figure 1 for Learning Object-Centric Video Models by Contrasting Sets
Figure 2 for Learning Object-Centric Video Models by Contrasting Sets
Figure 3 for Learning Object-Centric Video Models by Contrasting Sets
Figure 4 for Learning Object-Centric Video Models by Contrasting Sets
Viaarxiv icon

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Add code
Bookmark button
Alert button
Oct 22, 2020
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Figure 1 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 2 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 3 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 4 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Viaarxiv icon