Picture for Matthias Minderer

Matthias Minderer

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Figure 1 for PaliGemma: A versatile 3B VLM for transfer
Figure 2 for PaliGemma: A versatile 3B VLM for transfer
Figure 3 for PaliGemma: A versatile 3B VLM for transfer
Figure 4 for PaliGemma: A versatile 3B VLM for transfer
Viaarxiv icon

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Add code
Mar 21, 2024
Figure 1 for Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Figure 2 for Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Figure 3 for Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Figure 4 for Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Viaarxiv icon

Improving fine-grained understanding in image-text pre-training

Add code
Jan 18, 2024
Viaarxiv icon

Video OWL-ViT: Temporally-consistent open-world localization in video

Add code
Aug 22, 2023
Figure 1 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 2 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 3 for Video OWL-ViT: Temporally-consistent open-world localization in video
Figure 4 for Video OWL-ViT: Temporally-consistent open-world localization in video
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Figure 1 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 2 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 3 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 4 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Viaarxiv icon

Scaling Open-Vocabulary Object Detection

Add code
Jun 16, 2023
Figure 1 for Scaling Open-Vocabulary Object Detection
Figure 2 for Scaling Open-Vocabulary Object Detection
Figure 3 for Scaling Open-Vocabulary Object Detection
Figure 4 for Scaling Open-Vocabulary Object Detection
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

FlexiViT: One Model for All Patch Sizes

Add code
Dec 15, 2022
Figure 1 for FlexiViT: One Model for All Patch Sizes
Figure 2 for FlexiViT: One Model for All Patch Sizes
Figure 3 for FlexiViT: One Model for All Patch Sizes
Figure 4 for FlexiViT: One Model for All Patch Sizes
Viaarxiv icon

Decoder Denoising Pretraining for Semantic Segmentation

Add code
May 23, 2022
Figure 1 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 2 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 3 for Decoder Denoising Pretraining for Semantic Segmentation
Figure 4 for Decoder Denoising Pretraining for Semantic Segmentation
Viaarxiv icon