Picture for Fahad Shahbaz Khan

Fahad Shahbaz Khan

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Add code
Jun 06, 2025
Viaarxiv icon

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

Add code
Jun 05, 2025
Viaarxiv icon

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

Add code
May 30, 2025
Viaarxiv icon

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

Add code
May 29, 2025
Viaarxiv icon

One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

Add code
May 28, 2025
Viaarxiv icon

GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder

Add code
May 17, 2025
Viaarxiv icon

MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark

Add code
May 16, 2025
Viaarxiv icon

Deep Learning in Concealed Dense Prediction

Add code
Apr 15, 2025
Viaarxiv icon

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Add code
Mar 27, 2025
Viaarxiv icon