music generation


Music generation is the task of generating music or music-like sounds from a model or algorithm.

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

Add code
Aug 11, 2025
Viaarxiv icon

MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Add code
Jul 08, 2025
Viaarxiv icon

Direction of Arrival Estimation with Virtual Antenna Array Using FMCW Radar Simulated Data

Add code
Aug 11, 2025
Viaarxiv icon

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture

Add code
Jun 26, 2025
Viaarxiv icon

A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature

Add code
Aug 06, 2025
Viaarxiv icon

SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models

Add code
Jul 09, 2025
Figure 1 for SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models
Figure 2 for SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models
Figure 3 for SonicMotion: Dynamic Spatial Audio Soundscapes with Latent Diffusion Models
Viaarxiv icon

Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis

Add code
Jul 09, 2025
Figure 1 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 2 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 3 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Figure 4 for Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
Viaarxiv icon

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Add code
Jul 16, 2025
Figure 1 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 2 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 3 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Figure 4 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations
Viaarxiv icon

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Add code
Jul 03, 2025
Figure 1 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 2 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 3 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Figure 4 for DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Figure 1 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 2 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 3 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Figure 4 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
Viaarxiv icon