Picture for V. W.

V. W.

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search

Add code
Jun 11, 2025
Viaarxiv icon