Picture for Mustafa Shukor

Mustafa Shukor

VL-JEPA: Joint Embedding Predictive Architecture for Vision-language

Add code
Dec 11, 2025
Viaarxiv icon

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Add code
Apr 10, 2025
Figure 1 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 2 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 3 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Figure 4 for Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Viaarxiv icon

Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment

Add code
Jan 06, 2025
Viaarxiv icon

Multimodal Autoregressive Pre-training of Large Vision Encoders

Add code
Nov 21, 2024
Figure 1 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 2 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 3 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Figure 4 for Multimodal Autoregressive Pre-training of Large Vision Encoders
Viaarxiv icon

Skipping Computations in Multimodal LLMs

Add code
Oct 12, 2024
Viaarxiv icon

A Concept-Based Explainability Framework for Large Multimodal Models

Add code
Jun 12, 2024
Viaarxiv icon

Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features

Add code
Jun 05, 2024
Figure 1 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 2 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 3 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Figure 4 for Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
Viaarxiv icon

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

Add code
May 26, 2024
Figure 1 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 2 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 3 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Figure 4 for Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Viaarxiv icon

What Makes Multimodal In-Context Learning Work?

Add code
Apr 25, 2024
Figure 1 for What Makes Multimodal In-Context Learning Work?
Figure 2 for What Makes Multimodal In-Context Learning Work?
Figure 3 for What Makes Multimodal In-Context Learning Work?
Figure 4 for What Makes Multimodal In-Context Learning Work?
Viaarxiv icon

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

Add code
Mar 29, 2024
Figure 1 for FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
Figure 2 for FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
Figure 3 for FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
Figure 4 for FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
Viaarxiv icon