Picture for Renying Wang

Renying Wang

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

Add code
May 11, 2026
Viaarxiv icon