Music Captioning


Rethinking Music Captioning with Music Metadata LLMs

Add code
Feb 03, 2026
Viaarxiv icon

Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions

Add code
Feb 05, 2026
Viaarxiv icon

ConceptCaps -- a Distilled Concept Dataset for Interpretability in Music Models

Add code
Jan 20, 2026
Viaarxiv icon

Towards Effective Negation Modeling in Joint Audio-Text Models for Music

Add code
Jan 20, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Training-Efficient Text-to-Music Generation with State-Space Modeling

Add code
Jan 21, 2026
Viaarxiv icon

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Add code
Dec 22, 2025
Viaarxiv icon

Music Flamingo: Scaling Music Understanding in Audio Language Models

Add code
Nov 13, 2025
Viaarxiv icon

FoleyBench: A Benchmark For Video-to-Audio Models

Add code
Nov 17, 2025
Figure 1 for FoleyBench: A Benchmark For Video-to-Audio Models
Figure 2 for FoleyBench: A Benchmark For Video-to-Audio Models
Figure 3 for FoleyBench: A Benchmark For Video-to-Audio Models
Figure 4 for FoleyBench: A Benchmark For Video-to-Audio Models
Viaarxiv icon

Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?

Add code
Oct 16, 2025
Viaarxiv icon