Picture for Afshin Dehghan

Afshin Dehghan

VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization

Add code
Apr 14, 2026
Viaarxiv icon

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

Add code
Nov 18, 2025
Viaarxiv icon

AToken: A Unified Tokenizer for Vision

Add code
Sep 19, 2025
Viaarxiv icon

Language Models Improve When Pretraining Data Matches Target Tasks

Add code
Jul 16, 2025
Viaarxiv icon

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

Add code
May 29, 2025
Viaarxiv icon

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation

Add code
May 20, 2025
Viaarxiv icon

StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

Add code
May 08, 2025
Viaarxiv icon

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding

Add code
Mar 27, 2025
Viaarxiv icon

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Add code
Mar 17, 2025
Figure 1 for MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Figure 2 for MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Figure 3 for MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Figure 4 for MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Viaarxiv icon

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Add code
Feb 19, 2025
Viaarxiv icon