Alert button
Picture for Xiaohua Zhai

Xiaohua Zhai

Alert button

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Mar 30, 2023
Lucas Beyer, Bo Wan, Gagan Madan, Filip Pavetic, Andreas Steiner, Alexander Kolesnikov, André Susano Pinto, Emanuele Bugliarello, Xiao Wang, Qihang Yu, Liang-Chieh Chen, Xiaohua Zhai

Figure 1 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 2 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 3 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Figure 4 for A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Viaarxiv icon

Sigmoid Loss for Language Image Pre-Training

Mar 27, 2023
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer

Figure 1 for Sigmoid Loss for Language Image Pre-Training
Figure 2 for Sigmoid Loss for Language Image Pre-Training
Figure 3 for Sigmoid Loss for Language Image Pre-Training
Figure 4 for Sigmoid Loss for Language Image Pre-Training
Viaarxiv icon

Tuning computer vision models with task rewards

Feb 16, 2023
André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai

Figure 1 for Tuning computer vision models with task rewards
Figure 2 for Tuning computer vision models with task rewards
Figure 3 for Tuning computer vision models with task rewards
Figure 4 for Tuning computer vision models with task rewards
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Feb 10, 2023
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Dec 15, 2022
Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic

Figure 1 for FlexiViT: One Model for All Patch Sizes
Figure 2 for FlexiViT: One Model for All Patch Sizes
Figure 3 for FlexiViT: One Model for All Patch Sizes
Figure 4 for FlexiViT: One Model for All Patch Sizes
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Sep 16, 2022
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

Revisiting Neural Scaling Laws in Language and Vision

Sep 13, 2022
Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

Figure 1 for Revisiting Neural Scaling Laws in Language and Vision
Figure 2 for Revisiting Neural Scaling Laws in Language and Vision
Figure 3 for Revisiting Neural Scaling Laws in Language and Vision
Figure 4 for Revisiting Neural Scaling Laws in Language and Vision
Viaarxiv icon

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

May 27, 2022
Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

Figure 1 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 2 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 3 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 4 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Viaarxiv icon

Simple Open-Vocabulary Object Detection with Vision Transformers

May 12, 2022
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby

Figure 1 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Simple Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon