Picture for Michael Tschannen

Michael Tschannen

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Add code
Jan 03, 2024
Figure 1 for Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Figure 2 for Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Figure 3 for Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Figure 4 for Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Viaarxiv icon

GIVT: Generative Infinite-Vocabulary Transformers

Add code
Dec 04, 2023
Viaarxiv icon

Finite Scalar Quantization: VQ-VAE Made Simple

Add code
Oct 12, 2023
Figure 1 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 2 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 3 for Finite Scalar Quantization: VQ-VAE Made Simple
Figure 4 for Finite Scalar Quantization: VQ-VAE Made Simple
Viaarxiv icon

Image Captioners Are Scalable Vision Learners Too

Add code
Jun 13, 2023
Figure 1 for Image Captioners Are Scalable Vision Learners Too
Figure 2 for Image Captioners Are Scalable Vision Learners Too
Figure 3 for Image Captioners Are Scalable Vision Learners Too
Figure 4 for Image Captioners Are Scalable Vision Learners Too
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

M2T: Masking Transformers Twice for Faster Decoding

Add code
Apr 14, 2023
Figure 1 for M2T: Masking Transformers Twice for Faster Decoding
Figure 2 for M2T: Masking Transformers Twice for Faster Decoding
Figure 3 for M2T: Masking Transformers Twice for Faster Decoding
Figure 4 for M2T: Masking Transformers Twice for Faster Decoding
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Figure 1 for FlexiViT: One Model for All Patch Sizes
Figure 2 for FlexiViT: One Model for All Patch Sizes
Figure 3 for FlexiViT: One Model for All Patch Sizes
Figure 4 for FlexiViT: One Model for All Patch Sizes
Viaarxiv icon