Alert button
Picture for Andreas Steiner

Andreas Steiner

Alert button

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Bookmark button
Alert button
Mar 28, 2024
Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai

Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Bookmark button
Alert button
Mar 07, 2024
Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Bookmark button
Alert button
Jul 12, 2023
Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby

Figure 1 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 2 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 3 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 4 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Bookmark button
Alert button
Jun 13, 2023
Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer

Figure 1 for Image Captioners Are Scalable Vision Learners Too
Figure 2 for Image Captioners Are Scalable Vision Learners Too
Figure 3 for Image Captioners Are Scalable Vision Learners Too
Figure 4 for Image Captioners Are Scalable Vision Learners Too
Viaarxiv icon

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Add code
Bookmark button
Alert button
May 29, 2023
Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou

Figure 1 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 2 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 3 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 4 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Viaarxiv icon

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Add code
Bookmark button
Alert button
Mar 30, 2023
Lucas Beyer, Bo Wan, Gagan Madan, Filip Pavetic, Andreas Steiner, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai

Figure 1 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 2 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 3 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 4 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Bookmark button
Alert button
Feb 10, 2023
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Bookmark button
Alert button
Sep 16, 2022
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

LiT: Zero-Shot Transfer with Locked-image Text Tuning

Add code
Bookmark button
Alert button
Nov 15, 2021
Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

Figure 1 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 2 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 3 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 4 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Viaarxiv icon