Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Add code
May 11, 2025

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: