Picture for Larissa Koch

Larissa Koch

Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?

Add code
Jun 13, 2025
Viaarxiv icon