Alert button
Picture for Xiaohua Zhai

Xiaohua Zhai

Alert button

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Bookmark button
Alert button
Mar 28, 2024
Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua Zhai

Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Bookmark button
Alert button
Mar 07, 2024
Ibrahim Alabdulmohsin, Xiao Wang, Andreas Steiner, Priya Goyal, Alexander D'Amour, Xiaohua Zhai

Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon

SILC: Improving Vision Language Pretraining with Self-Distillation

Add code
Bookmark button
Alert button
Oct 20, 2023
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool, Federico Tombari

Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Add code
Bookmark button
Alert button
Oct 17, 2023
Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut

Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Bookmark button
Alert button
Jun 13, 2023
Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer

Figure 1 for Image Captioners Are Scalable Vision Learners Too
Figure 2 for Image Captioners Are Scalable Vision Learners Too
Figure 3 for Image Captioners Are Scalable Vision Learners Too
Figure 4 for Image Captioners Are Scalable Vision Learners Too
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
Bookmark button
Alert button
May 29, 2023
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Three Towers: Flexible Contrastive Learning with Pretrained Image Models

Add code
Bookmark button
Alert button
May 29, 2023
Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou

Figure 1 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 2 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 3 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Figure 4 for Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Viaarxiv icon

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Add code
Bookmark button
Alert button
May 22, 2023
Ibrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas Beyer

Figure 1 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 2 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 3 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 4 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Viaarxiv icon