Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"music generation": models, code, and papers

A Human-Computer Duet System for Music Performance

Sep 16, 2020
Yuen-Jen Lin, Hsuan-Kai Kao, Yih-Chih Tseng, Ming Tsai, Li Su

Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed.

  

Music Generation using Three-layered LSTM

Jun 09, 2021
Vaishali Ingale, Anush Mohan, Divit Adlakha, Krishan Kumar, Mohit Gupta

This paper explores the idea of utilising Long Short-Term Memory neural networks (LSTMNN) for the generation of musical sequences in ABC notation. The proposed approach takes ABC notations from the Nottingham dataset and encodes it to be fed as input for the neural networks. The primary objective is to input the neural networks with an arbitrary note, let the network process and augment a sequence based on the note until a good piece of music is produced. Multiple calibrations have been done to amend the parameters of the network for optimal generation. The output is assessed on the basis of rhythm, harmony, and grammar accuracy.

  

Music Generation using Three layered LSTM

May 27, 2021
Vaishali Ingale, Anush Mohan, Divit Adlakha, Krishna Kumar, Mohit Gupta

This paper explores the idea of utilising Long Short-Term Memory neural networks (LSTMNN) for the generation of musical sequences in ABC notation. The proposed approach takes ABC notations from the Nottingham dataset and encodes it to beefed as input for the neural networks. The primary objective is to input the neural networks with an arbitrary note, let the network process and augment a sequence based on the note until a good piece of music is produced. Multiple tunings have been done to amend the parameters of the network for optimal generation. The output is assessed on the basis of rhythm, harmony, and grammar accuracy.

  

Signal-domain representation of symbolic music for learning embedding spaces

Sep 08, 2021
Mathieu Prang, Philippe Esling

A key aspect of machine learning models lies in their ability to learn efficient intermediate features. However, the input representation plays a crucial role in this process, and polyphonic musical scores remain a particularly complex type of information. In this paper, we introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal. We evaluate the ability to learn meaningful features from this representation from a musical point of view. Hence, we introduce an evaluation method relying on principled generation of synthetic data. Finally, to test our proposed representation we conduct an extensive benchmark against recent polyphonic symbolic representations. We show that our signal-like representation leads to better reconstruction and disentangled features. This improvement is reflected in the metric properties and in the generation ability of the space learned from our signal-like representation according to music theory properties.

* The 2020 Joint Conference on AI Music Creativity, Oct 2020, Stockholm, Sweden 
  

Symbolic Music Generation with Diffusion Models

Mar 30, 2021
Gautam Mittal, Jesse Engel, Curtis Hawthorne, Ian Simon

Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. Our method is non-autoregressive and learns to generate sequences of latent embeddings through the reverse process and offers parallel generation with a constant number of iterative refinement steps. We apply this technique to modeling symbolic music and show strong unconditional generation and post-hoc conditional infilling results compared to autoregressive language models operating over the same continuous embeddings.

  

MusPy: A Toolkit for Symbolic Music Generation

Aug 05, 2020
Hao-Wen Dong, Ke Chen, Julian McAuley, Taylor Berg-Kirkpatrick

In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we conduct a cross-dataset generalizability experiment by training an autoregressive model on each dataset and measuring held-out likelihood on the others---a process which is made easier by MusPy's dataset management system. The results provide a map of domain overlap between various commonly used datasets and show that some datasets contain more representative cross-genre samples than others. Along with the dataset analysis, these results might serve as a guide for choosing datasets in future research. Source code and documentation are available at https://github.com/salu133445/muspy .

* Accepted by International Society for Music Information Retrieval Conference (ISMIR), 2020 
  

VirtualConductor: Music-driven Conducting Video Generation System

Jul 28, 2021
Delong Chen, Fan Liu, Zewen Li, Feng Xu

In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image. First, a large-scale conductor motion dataset is collected and constructed. Then, we propose Audio Motion Correspondence Network (AMCNet) and adversarial-perceptual learning to learn the cross-modal relationship and generate diverse, plausible, music-synchronized motion. Finally, we combine 3D animation rendering and a pose transfer model to synthesize conducting video from a single given user's image. Therefore, any user can become a virtual conductor through the system.

* Accepted by IEEE International Conference on Multimedia and Expo (ICME) 2021, demo track. Best demo 
  

Music-driven Dance Regeneration with Controllable Key Pose Constraints

Jul 08, 2022
Junfu Pu, Ying Shan

In this paper, we propose a novel framework for music-driven dance motion synthesis with controllable key pose constraint. In contrast to methods that generate dance motion sequences only based on music without any other controllable conditions, this work targets on synthesizing high-quality dance motion driven by music as well as customized poses performed by users. Our model involves two single-modal transformer encoders for music and motion representations and a cross-modal transformer decoder for dance motions generation. The cross-modal transformer decoder achieves the capability of synthesizing smooth dance motion sequences, which keeps a consistency with key poses at corresponding positions, by introducing the local neighbor position embedding. Such mechanism makes the decoder more sensitive to key poses and the corresponding positions. Our dance synthesis model achieves satisfactory performance both on quantitative and qualitative evaluations with extensive experiments, which demonstrates the effectiveness of our proposed method.

  

Multi-instrument Music Synthesis with Spectrogram Diffusion

Jun 11, 2022
Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse Engel

An ideal music synthesizer should be both interactive and expressive, generating high-fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural synthesizers have exhibited a tradeoff between domain-specific models that offer detailed control of only specific instruments, or raw waveform models that can train on all of music but with minimal control and slow generation. In this work, we focus on a middle ground of neural synthesizers that can generate audio from MIDI sequences with arbitrary combinations of instruments in realtime. This enables training on a wide range of transcription datasets with a single model, which in turn offers note-level control of composition and instrumentation across a wide range of instruments. We use a simple two-stage process: MIDI to spectrograms with an encoder-decoder Transformer, then spectrograms to audio with a generative adversarial network (GAN) spectrogram inverter. We compare training the decoder as an autoregressive model and as a Denoising Diffusion Probabilistic Model (DDPM) and find that the DDPM approach is superior both qualitatively and as measured by audio reconstruction and Fr\'echet distance metrics. Given the interactivity and generality of this approach, we find this to be a promising first step towards interactive and expressive neural synthesis for arbitrary combinations of instruments and notes.

  

Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles

Apr 30, 2021
Xinjian Zhang, Yi Xu, Su Yang, Longwen Gao, Huyang Sun

Choreography refers to creation of dance steps and motions for dances according to the latent knowledge in human mind, where the created dance motions are in general style-specific and consistent. So far, such latent style-specific knowledge about dance styles cannot be represented explicitly in human language and has not yet been learned in previous works on music-to-dance generation tasks. In this paper, we propose a novel music-to-dance synthesis framework with controllable style embeddings. These embeddings are learned representations of style-consistent kinematic abstraction of reference dance clips, which act as controllable factors to impose style constraints on dance generation in a latent manner. Thus, the dance styles can be transferred to dance motions by merely modifying the style embeddings. To support this study, we build a large music-to-dance dataset. The qualitative and quantitative evaluations demonstrate the advantage of our proposed framework, as well as the ability of synthesizing diverse styles of dances from identical music via style embeddings.

* Submit to IJCAI-21 
  
<<
12
13
14
15
16
17
18
19
20
21
22
23
24
>>