Picture for Song-Chun Zhu

Song-Chun Zhu

University of California, Los Angeles

SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning

Add code
Jun 05, 2025
Viaarxiv icon

Discrete Markov Bridge

Add code
May 26, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Add code
May 19, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

Add code
May 05, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Add code
Apr 17, 2025
Viaarxiv icon

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Add code
Apr 01, 2025
Viaarxiv icon

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

Add code
Mar 19, 2025
Viaarxiv icon