Picture for Xiufeng Song

Xiufeng Song

Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning

Add code
Mar 16, 2026
Viaarxiv icon

Reading $ eq$ Seeing: Diagnosing and Closing the Typography Gap in Vision-Language Models

Add code
Mar 09, 2026
Viaarxiv icon

Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge

Add code
Jan 26, 2026
Viaarxiv icon

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Add code
Jun 10, 2025
Viaarxiv icon

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Add code
Mar 26, 2025
Viaarxiv icon

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Add code
Mar 26, 2025
Figure 1 for UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Figure 2 for UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Figure 3 for UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Figure 4 for UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Viaarxiv icon

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Add code
Mar 20, 2025
Viaarxiv icon

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

Add code
Oct 31, 2024
Viaarxiv icon