Alert button
Picture for Michael Tschannen

Michael Tschannen

Alert button

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Bookmark button
Alert button
Mar 28, 2024
Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai

Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Add code
Bookmark button
Alert button
Jan 03, 2024
Aleksandar Stanić, Sergi Caelles, Michael Tschannen

Viaarxiv icon

GIVT: Generative Infinite-Vocabulary Transformers

Add code
Bookmark button
Alert button
Dec 04, 2023
Michael Tschannen, Cian Eastwood, Fabian Mentzer

Viaarxiv icon

Finite Scalar Quantization: VQ-VAE Made Simple

Add code
Bookmark button
Alert button
Oct 12, 2023
Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen

Figure 1 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 2 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 3 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 4 for Finite Scalar Quantization: VQ-VAE Made Simple
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Bookmark button
Alert button
Jun 13, 2023
Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer

Figure 1 for Image Captioners Are Scalable Vision Learners Too
Figure 2 for Image Captioners Are Scalable Vision Learners Too
Figure 3 for Image Captioners Are Scalable Vision Learners Too
Figure 4 for Image Captioners Are Scalable Vision Learners Too
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
Bookmark button
Alert button
May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

M2T: Masking Transformers Twice for Faster Decoding

Add code
Bookmark button
Alert button
Apr 14, 2023
Fabian Mentzer, Eirikur Agustsson, Michael Tschannen

Figure 1 for M2T: Masking Transformers Twice for Faster Decoding
Figure 2 for M2T: Masking Transformers Twice for Faster Decoding
Figure 3 for M2T: Masking Transformers Twice for Faster Decoding
Figure 4 for M2T: Masking Transformers Twice for Faster Decoding
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Bookmark button
Alert button
Feb 10, 2023
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

Image-and-Language Understanding from Pixels Only

Add code
Bookmark button
Alert button
Dec 15, 2022
Michael Tschannen, Basil Mustafa, Neil Houlsby

Figure 1 for Image-and-Language Understanding from Pixels Only
Figure 2 for Image-and-Language Understanding from Pixels Only
Figure 3 for Image-and-Language Understanding from Pixels Only
Figure 4 for Image-and-Language Understanding from Pixels Only
Viaarxiv icon