Picture for Ibrahim Alabdulmohsin

Ibrahim Alabdulmohsin

No Filter: Cultural and Socioeconomic Diversityin Contrastive Vision-Language Models

Add code
May 22, 2024
Viaarxiv icon

LocCa: Visual Pretraining with Location-aware Captioners

Add code
Mar 28, 2024
Figure 1 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 2 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 3 for LocCa: Visual Pretraining with Location-aware Captioners
Figure 4 for LocCa: Visual Pretraining with Location-aware Captioners
Viaarxiv icon

CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?

Add code
Mar 07, 2024
Figure 1 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 2 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 3 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Figure 4 for CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Viaarxiv icon

Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

Add code
Feb 02, 2024
Viaarxiv icon

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Oct 17, 2023
Figure 1 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 2 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 3 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Figure 4 for PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Figure 1 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 2 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 3 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Figure 4 for Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Add code
May 22, 2023
Figure 1 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 2 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 3 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Figure 4 for Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Figure 1 for Scaling Vision Transformers to 22 Billion Parameters
Figure 2 for Scaling Vision Transformers to 22 Billion Parameters
Figure 3 for Scaling Vision Transformers to 22 Billion Parameters
Figure 4 for Scaling Vision Transformers to 22 Billion Parameters
Viaarxiv icon

Adapting to Latent Subgroup Shifts via Concepts and Proxies

Dec 21, 2022
Figure 1 for Adapting to Latent Subgroup Shifts via Concepts and Proxies
Figure 2 for Adapting to Latent Subgroup Shifts via Concepts and Proxies
Figure 3 for Adapting to Latent Subgroup Shifts via Concepts and Proxies
Figure 4 for Adapting to Latent Subgroup Shifts via Concepts and Proxies
Viaarxiv icon