Picture for Salman Khan

Salman Khan

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

Add code
Jun 08, 2025
Viaarxiv icon

A Culturally-diverse Multilingual Multimodal Video Benchmark & Model

Add code
Jun 08, 2025
Viaarxiv icon

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Add code
Jun 06, 2025
Viaarxiv icon

VideoMolmo: Spatio-Temporal Grounding Meets Pointing

Add code
Jun 05, 2025
Viaarxiv icon

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Add code
Jun 05, 2025
Viaarxiv icon

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

Add code
May 30, 2025
Viaarxiv icon

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

Add code
May 29, 2025
Viaarxiv icon

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

Add code
May 26, 2025
Viaarxiv icon

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Add code
May 22, 2025
Viaarxiv icon

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

Add code
May 22, 2025
Viaarxiv icon