Picture for Xiangyi Wei

Xiangyi Wei

Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation

Add code
Nov 13, 2025
Viaarxiv icon

TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation

Add code
Apr 24, 2025
Viaarxiv icon