Picture for Xiaohua Zhai

Xiaohua Zhai

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Figure 1 for FlexiViT: One Model for All Patch Sizes
Figure 2 for FlexiViT: One Model for All Patch Sizes
Figure 3 for FlexiViT: One Model for All Patch Sizes
Figure 4 for FlexiViT: One Model for All Patch Sizes
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Add code
Sep 16, 2022
Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

Revisiting Neural Scaling Laws in Language and Vision

Add code
Sep 13, 2022
Figure 1 for Revisiting Neural Scaling Laws in Language and Vision
Figure 2 for Revisiting Neural Scaling Laws in Language and Vision
Figure 3 for Revisiting Neural Scaling Laws in Language and Vision
Figure 4 for Revisiting Neural Scaling Laws in Language and Vision
Viaarxiv icon

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Add code
May 27, 2022
Figure 1 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 2 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 3 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Figure 4 for UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Viaarxiv icon

Simple Open-Vocabulary Object Detection with Vision Transformers

Add code
May 12, 2022
Figure 1 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Simple Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon

Better plain ViT baselines for ImageNet-1k

Add code
May 03, 2022
Figure 1 for Better plain ViT baselines for ImageNet-1k
Figure 2 for Better plain ViT baselines for ImageNet-1k
Figure 3 for Better plain ViT baselines for ImageNet-1k
Viaarxiv icon

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

Add code
Dec 17, 2021
Figure 1 for A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Figure 2 for A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Figure 3 for A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Figure 4 for A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Viaarxiv icon

LiT: Zero-Shot Transfer with Locked-image Text Tuning

Add code
Nov 15, 2021
Figure 1 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 2 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 3 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Figure 4 for LiT: Zero-Shot Transfer with Locked-image Text Tuning
Viaarxiv icon

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Add code
Jun 18, 2021
Figure 1 for How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Figure 2 for How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Figure 3 for How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Figure 4 for How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Viaarxiv icon

Revisiting the Calibration of Modern Neural Networks

Add code
Jun 15, 2021
Figure 1 for Revisiting the Calibration of Modern Neural Networks
Figure 2 for Revisiting the Calibration of Modern Neural Networks
Figure 3 for Revisiting the Calibration of Modern Neural Networks
Figure 4 for Revisiting the Calibration of Modern Neural Networks
Viaarxiv icon