Picture for Weidi Xie

Weidi Xie

SoccerMaster: A Vision Foundation Model for Soccer Understanding

Add code
Dec 11, 2025
Viaarxiv icon

Inferring Dynamic Physical Properties from Video Foundation Models

Add code
Oct 02, 2025
Viaarxiv icon

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Add code
Jun 23, 2025
Viaarxiv icon

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Add code
May 22, 2025
Viaarxiv icon

Multi-Agent System for Comprehensive Soccer Understanding

Add code
May 06, 2025
Viaarxiv icon

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification

Add code
Apr 29, 2025
Viaarxiv icon

Learning Streaming Video Representation via Multitask Training

Add code
Apr 28, 2025
Figure 1 for Learning Streaming Video Representation via Multitask Training
Figure 2 for Learning Streaming Video Representation via Multitask Training
Figure 3 for Learning Streaming Video Representation via Multitask Training
Figure 4 for Learning Streaming Video Representation via Multitask Training
Viaarxiv icon

EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Add code
Apr 16, 2025
Figure 1 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 2 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 3 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Figure 4 for EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Viaarxiv icon

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

Add code
Apr 01, 2025
Viaarxiv icon

Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases

Add code
Mar 06, 2025
Figure 1 for Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases
Figure 2 for Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases
Viaarxiv icon