music generation


Music generation is the task of generating music or music-like sounds from a model or algorithm.

Audio-Sync Video Generation with Multi-Stream Temporal Control

Add code
Jun 09, 2025
Figure 1 for Audio-Sync Video Generation with Multi-Stream Temporal Control
Figure 2 for Audio-Sync Video Generation with Multi-Stream Temporal Control
Figure 3 for Audio-Sync Video Generation with Multi-Stream Temporal Control
Figure 4 for Audio-Sync Video Generation with Multi-Stream Temporal Control
Viaarxiv icon

LoopGen: Training-Free Loopable Music Generation

Add code
Apr 08, 2025
Viaarxiv icon

Of All StrIPEs: Investigating Structure-informed Positional Encoding for Efficient Music Generation

Add code
Apr 07, 2025
Viaarxiv icon

Do Music Preferences Reflect Cultural Values? A Cross-National Analysis Using Music Embedding and World Values Survey

Add code
Jun 16, 2025
Viaarxiv icon

Generation of Musical Timbres using a Text-Guided Diffusion Model

Add code
Apr 12, 2025
Viaarxiv icon

Detecting Musical Deepfakes

Add code
May 03, 2025
Viaarxiv icon

U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding

Add code
May 20, 2025
Viaarxiv icon

JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry

Add code
Apr 29, 2025
Figure 1 for JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Figure 2 for JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Figure 3 for JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Figure 4 for JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Figure 1 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 2 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 3 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 4 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Viaarxiv icon

Semantics-Aware Human Motion Generation from Audio Instructions

Add code
May 29, 2025
Viaarxiv icon