Music Captioning


Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment

Add code
May 19, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Viaarxiv icon

MusFlow: Multimodal Music Generation via Conditional Flow Matching

Add code
Apr 18, 2025
Viaarxiv icon

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Add code
Mar 13, 2025
Viaarxiv icon

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata

Add code
Feb 11, 2025
Viaarxiv icon

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Add code
Mar 06, 2025
Viaarxiv icon

Can Impressions of Music be Extracted from Thumbnail Images?

Add code
Jan 05, 2025
Figure 1 for Can Impressions of Music be Extracted from Thumbnail Images?
Figure 2 for Can Impressions of Music be Extracted from Thumbnail Images?
Figure 3 for Can Impressions of Music be Extracted from Thumbnail Images?
Figure 4 for Can Impressions of Music be Extracted from Thumbnail Images?
Viaarxiv icon

MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions

Add code
Jan 02, 2025
Figure 1 for MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
Figure 2 for MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
Figure 3 for MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
Figure 4 for MMVA: Multimodal Matching Based on Valence and Arousal across Images, Music, and Musical Captions
Viaarxiv icon

Text2midi: Generating Symbolic Music from Captions

Add code
Dec 21, 2024
Viaarxiv icon

Do Captioning Metrics Reflect Music Semantic Alignment?

Add code
Nov 18, 2024
Viaarxiv icon