Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"music": models, code, and papers

Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

Dec 03, 2018
Nikhil Kotecha

Figure 1 for Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

Figure 2 for Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

Figure 3 for Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

Figure 4 for Bach2Bach: Generating Music Using A Deep Reinforcement Learning Approach

A model of music needs to have the ability to recall past details and have a clear, coherent understanding of musical structure. Detailed in the paper is a deep reinforcement learning architecture that predicts and generates polyphonic music aligned with musical rules. The probabilistic model presented is a Bi-axial LSTM trained with a pseudo-kernel reminiscent of a convolutional kernel. To encourage exploration and impose greater global coherence on the generated music, a deep reinforcement learning approach DQN is adopted. When analyzed quantitatively and qualitatively, this approach performs well in composing polyphonic music.

* 42 pages

Via

Access Paper or Ask Questions

A holistic approach to polyphonic music transcription with neural networks

Oct 26, 2019
Miguel A. Román, Antonio Pertusa, Jorge Calvo-Zaragoza

Figure 1 for A holistic approach to polyphonic music transcription with neural networks

Figure 2 for A holistic approach to polyphonic music transcription with neural networks

Figure 3 for A holistic approach to polyphonic music transcription with neural networks

Figure 4 for A holistic approach to polyphonic music transcription with neural networks

We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that can be further transformed into a score by incorporating tempo estimation, beat tracking, key estimation or rhythm quantization. Unlike these methods, our approach generates music notation directly from the input audio in a single stage. For this, we use a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function which does not require annotated alignments of audio frames with the score rhythmic information. We trained our model using as input Haydn, Mozart, and Beethoven string quartets and Bach chorales synthesized with different tempos and expressive performances. The output is a textual representation of four-voice music scores based on **kern format. Although the proposed approach is evaluated in a simplified scenario, results show that this model can learn to transcribe scores directly from audio signals, opening a promising avenue towards complete AMT.

* Source code available at https://github.com/mangelroman/audio2score

Via

Access Paper or Ask Questions

Toward Interpretable Music Tagging with Self-Attention

Jun 12, 2019
Minz Won, Sanghyuk Chun, Xavier Serra

Figure 1 for Toward Interpretable Music Tagging with Self-Attention

Figure 2 for Toward Interpretable Music Tagging with Self-Attention

Figure 3 for Toward Interpretable Music Tagging with Self-Attention

Figure 4 for Toward Interpretable Music Tagging with Self-Attention

Self-attention is an attention mechanism that learns a representation by relating different positions in the sequence. The transformer, which is a sequence model solely based on self-attention, and its variants achieved state-of-the-art results in many natural language processing tasks. Since music composes its semantics based on the relations between components in sparse positions, adopting the self-attention mechanism to solve music information retrieval (MIR) problems can be beneficial. Hence, we propose a self-attention based deep sequence model for music tagging. The proposed architecture consists of shallow convolutional layers followed by stacked Transformer encoders. Compared to conventional approaches using fully convolutional or recurrent neural networks, our model is more interpretable while reporting competitive results. We validate the performance of our model with the MagnaTagATune and the Million Song Dataset. In addition, we demonstrate the interpretability of the proposed architecture with a heat map visualization.

* 13 pages, 12 figures; code: https://github.com/minzwon/self-attention-music-tagging

Via

Access Paper or Ask Questions

Multi-modal Conditional Bounding Box Regression for Music Score Following

May 10, 2021
Florian Henkel, Gerhard Widmer

Figure 1 for Multi-modal Conditional Bounding Box Regression for Music Score Following

Figure 2 for Multi-modal Conditional Bounding Box Regression for Music Score Following

This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing approaches from the literature for sheet-image-based score following as well as an Optical Music Recognition baseline. The proposed approach achieves new state-of-the-art results and furthermore significantly improves the alignment performance on a set of real-world piano recordings by applying Impulse Responses as a data augmentation technique.

* Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021

Via

Access Paper or Ask Questions

Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Mar 03, 2019
Harish Kumar, Balaraman Ravindran

Figure 1 for Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Figure 2 for Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Figure 3 for Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Figure 4 for Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

In the domain of algorithmic music composition, machine learning-driven systems eliminate the need for carefully hand-crafting rules for composition. In particular, the capability of recurrent neural networks to learn complex temporal patterns lends itself well to the musical domain. Promising results have been observed across a number of recent attempts at music composition using deep RNNs. These approaches generally aim at first training neural networks to reproduce subsequences drawn from existing songs. Subsequently, they are used to compose music either at the audio sample-level or at the note-level. We designed a representation that divides polyphonic music into a small number of monophonic streams. This representation greatly reduces the complexity of the problem and eliminates an exponential number of probably poor compositions. On top of our LSTM neural network that learnt musical sequences in this representation, we built an RL agent that learnt to find combinations of songs whose joint dominance produced pleasant compositions. We present Amadeus, an algorithmic music composition system that composes music that consists of intricate melodies, basic chords, and even occasional contrapuntal sequences.

Via

Access Paper or Ask Questions

Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Jun 28, 2022
Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans

Figure 1 for Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Figure 2 for Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Figure 3 for Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Figure 4 for Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utilizes instrument information and transcription results. The instrument conditioning is designed for an explicit multi-instrument functionality while the connection between the transcription and source separation modules is for better transcription performance. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. However, its novelty necessitates a new perspective on how to evaluate such a model. During the experiment, we assess the model from various aspects, providing a new evaluation perspective for multi-instrument transcription. We also argue that transcription models can be utilized as a preprocessing module for other music analysis tasks. In the experiment on several downstream tasks, the symbolic representation provided by our transcription model turned out to be helpful to spectrograms in solving downbeat detection, chord recognition, and key estimation.

* Submitted to ISMIR

Via

Access Paper or Ask Questions

Classical Music Prediction and Composition by means of Variational Autoencoders

Jun 21, 2019
Daniel Rivero, Enrique Fernandez-Blanco, Alejandro Pazos

Figure 1 for Classical Music Prediction and Composition by means of Variational Autoencoders

Figure 2 for Classical Music Prediction and Composition by means of Variational Autoencoders

Figure 3 for Classical Music Prediction and Composition by means of Variational Autoencoders

Figure 4 for Classical Music Prediction and Composition by means of Variational Autoencoders

This paper proposes a new model for music prediction based on Variational Autoencoders (VAEs). In this work, VAEs are used in a novel way in order to address two different problems: music representation into the latent space, and using this representation to make predictions of the future values of the musical piece. This approach was trained with different songs of a classical composer. As a result, the system can represent the music in the latent space, and make accurate predictions. Therefore, the system can be used to compose new music either from an existing piece or from a random starting point. An additional feature of this system is that a small dataset was used for training. However, results show that the system is able to return accurate representations and predictions in unseen data.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Now Playing: Continuous low-power music recognition

Nov 29, 2017
Blaise Agüera y Arcas, Beat Gfeller, Ruiqi Guo, Kevin Kilgour, Sanjiv Kumar, James Lyon, Julian Odell, Marvin Ritter, Dominik Roblek, Matthew Sharifi, Mihajlo Velimirović

Figure 1 for Now Playing: Continuous low-power music recognition

Figure 2 for Now Playing: Continuous low-power music recognition

Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present. Once woken, the recognizer on the application processor is provided with a few seconds of audio which is fingerprinted and compared to the stored fingerprints in the on-device fingerprint database of tens of thousands of songs. Our presented system, Now Playing, has a daily battery usage of less than 1% on average, respects user privacy by running entirely on-device and can passively recognize a wide range of music.

* Authors are listed in alphabetical order by last name

Via

Access Paper or Ask Questions

The Impact of Label Noise on a Music Tagger

Aug 14, 2020
Katharina Prinz, Arthur Flexer, Gerhard Widmer

Figure 1 for The Impact of Label Noise on a Music Tagger

We explore how much can be learned from noisy labels in audio music tagging. Our experiments show that carefully annotated labels result in highest figures of merit, but even high amounts of noisy labels contain enough information for successful learning. Artificial corruption of curated data allows us to quantize this contribution of noisy labels.

* In Proceedings of the 13th International Workshop on Machine Learning and Music, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Via

Access Paper or Ask Questions

CNN based music emotion classification

Apr 19, 2017
Xin Liu, Qingcai Chen, Xiangping Wu, Yan Liu, Yang Liu

Figure 1 for CNN based music emotion classification

Figure 2 for CNN based music emotion classification

Figure 3 for CNN based music emotion classification

Figure 4 for CNN based music emotion classification

Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these features and their corresponding emotion tags. Considering the inconsistency of emotions inspired by the same music segment for human beings, seeking for the key acoustic features that really affect on emotions is really a challenging task. In this paper, we propose a novel MER method by using deep convolutional neural network (CNN) on the music spectrograms that contains both the original time and frequency domain information. By the proposed method, no additional effort on extracting specific features required, which is left to the training procedure of the CNN model. Experiments are conducted on the standard CAL500 and CAL500exp dataset. Results show that, for both datasets, the proposed method outperforms state-of-the-art methods.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions