Picture for Chenxin Fang

Chenxin Fang

Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval

Add code
Dec 09, 2025
Viaarxiv icon

Grounded Chain-of-Thought for Multimodal Large Language Models

Add code
Mar 17, 2025
Viaarxiv icon