Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"music": models, code, and papers

Music Embedding: A Tool for Incorporating Music Theory into Computational Music Applications

Apr 24, 2021
SeyyedPooya HekmatiAthar, Mohd Anwar

Advancements in the digital technologies have enabled researchers to develop a variety of Computational Music applications. Such applications are required to capture, process, and generate data related to music. Therefore, it is important to digitally represent music in a music theoretic and concise manner. Existing approaches for representing music are ineffective in terms of utilizing music theory. In this paper, we address the disjoint of music theory and computational music by developing an opensource representation tool based on music theory. Through the wide range of use cases, we run an analysis on the classical music pieces to show the usefulness of the developed music embedding.

  Access Paper or Ask Questions

General Theory of Music by Icosahedron 3: Musical invariant and Melakarta raga

Sep 26, 2021
Yusuke Imai

Raga is a central musical concept in South Asia, especially India, and we investigate connections between Western classical music and Melakarta raga that is a raga in Karnatak (south Indian) classical music, through musical icosahedron. In our previous study, we introduced some kinds of musical icosahedra connecting various musical concepts in Western music: chromatic/whole tone musical icosahedra, Pythagorean/whole tone musical icosahedra, and exceptional musical icosahedra. In this paper, first, we introduce kinds of musical icosahedra that connect the above musical icosahedra through two kinds of permutations of 12 tones: inter-permutations and intra-permutations, and we call them intermediate musical icosahedra. Next, we define a neighboring number as a number of pairs of neighboring two tones in a given scale that neighbor each other on a given musical icosahedron, and we also define a musical invariant as a linear combination of the neighboring numbers. We find there exists a pair of a musical invariant and scales that is constant for some musical icosahedra and analyze their mathematical structure. Last, we define an extension of a given scale by the inter-permutations of a given musical icosahedron: the permutation-extension. Then, we show that the permutation-extension of the C major scale by Melakarta raga musical icosahedra that are four of the intermediate musical icosahedra from the type 1 chromatic/whole tone musical icosahedron to the type 1' Pythagorean/whole tone musical icosahedron, is a set of all the scales included in Melakarta raga. There exists a musical invariant that is constant for all the musical icosahedra corresponding to the scales of Melakarta raga, and we obtained a diagram representation of those scales characterizing the musical invariant.

* 31 pages, 34 figures 

  Access Paper or Ask Questions

Video Background Music Generation with Controllable Music Transformer

Nov 16, 2021
Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan

In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhythmic consistency. To generate the background music that matches the given video, we first establish the rhythmic relations between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density, and simu-note strength from music, respectively. We then propose CMT, a Controllable Music Transformer that enables local control of the aforementioned rhythmic features and global control of the music genre and instruments. Objective and subjective evaluations show that the generated background music has achieved satisfactory compatibility with the input videos, and at the same time, impressive music quality. Code and models are available at

* Accepted to ACM Multimedia 2021. Project website at 

  Access Paper or Ask Questions

Foley Music: Learning to Generate Music from Videos

Jul 21, 2020
Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the demo video with audio turned on to experience the results.

* ECCV 2020. Project page: 

  Access Paper or Ask Questions

Music-robust Automatic Lyrics Transcription of Polyphonic Music

Apr 07, 2022
Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a general-purpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.

* 7 pages, 2 figures, accepted by 2022 Sound and Music Computing 

  Access Paper or Ask Questions

Scorpiano -- A System for Automatic Music Transcription for Monophonic Piano Music

Aug 24, 2021
Bojan Sofronievski, Branislav Gerazov

Music transcription is the process of transcribing music audio into music notation. It is a field in which the machines still cannot beat human performance. The main motivation for automatic music transcription is to make it possible for anyone playing a musical instrument, to be able to generate the music notes for a piece of music quickly and accurately. It does not matter if the person is a beginner and simply struggles to find the music score by searching, or an expert who heard a live jazz improvisation and would like to reproduce it without losing time doing manual transcription. We propose Scorpiano -- a system that can automatically generate a music score for simple monophonic piano melody tracks using digital signal processing. The system integrates multiple digital audio processing methods: notes onset detection, tempo estimation, beat detection, pitch detection and finally generation of the music score. The system has proven to give good results for simple piano melodies, comparable to commercially available neural network based systems.

  Access Paper or Ask Questions

General Theory of Music by Icosahedron 2: Analysis of musical pieces by the exceptional musical icosahedra

Aug 17, 2021
Yusuke Imai

We propose a new approach to analyses of musical pieces by using the exceptional musical icosahedra where all the major/minor triads are represented by golden triangles and golden gnomons. First, we introduce a concept of the golden neighborhood that characterizes golden triangles/gnomons that neighbor a given golden triangle or gnomon. Then, we investigate a relation between the exceptional musical icosahedra and the neo-Riemannian theory, and find that the golden neighborhoods and the icosahedron symmetry relate any major/minor triad with any major/minor triad. Second, we show how the exceptional musical icosahedra are applied to analyzing harmonies constructed by four or more tones. We introduce two concepts, golden decomposition and golden singular. The golden decomposition is a decomposition of a given harmony into some harmonies constructing the given harmony and represented the golden figure (a golden triangle, a golden gnomon, or a golden rectangle). A harmony is golden singular if and only if the harmony does not have golden decompositions. We show results of the golden analysis (analysis by the golden decomposition) of the tertian seventh chords and the mystic chords. While the dominant seventh chord is golden singular in the type 1[star] and the type 4[star] exceptional musical icosahedron, the half-diminished seventh chord is golden singular in the type 2 [star] and the type 3[star] exceptional musical icosahedron. Last, we apply the golden analysis to the famous prelude in C major by Johan Sebastian Bach (BWV 846). We found 7 combinations of the golden figures on the type 2 [star] or the type 3 [star] exceptional musical icosahedron dually represent all the measures of the BWV 846.

* 33 pages, 51 figures 

  Access Paper or Ask Questions

Multilingual Music Genre Embeddings for Effective Cross-Lingual Music Item Annotation

Sep 16, 2020
Elena V. Epure, Guillaume Salha, Romain Hennequin

Annotating music items with music genres is crucial for music recommendation and information retrieval, yet challenging given that music genres are subjective concepts. Recently, in order to explicitly consider this subjectivity, the annotation of music items was modeled as a translation task: predict for a music item its music genres within a target vocabulary or taxonomy (tag system) from a set of music genre tags originating from other tag systems. However, without a parallel corpus, previous solutions could not handle tag systems in other languages, being limited to the English-language only. Here, by learning multilingual music genre embeddings, we enable cross-lingual music genre translation without relying on a parallel corpus. First, we apply compositionality functions on pre-trained word embeddings to represent multi-word tags.Second, we adapt the tag representations to the music domain by leveraging multilingual music genres graphs with a modified retrofitting algorithm. Experiments show that our method: 1) is effective in translating music genres across tag systems in multiple languages (English, French and Spanish); 2) outperforms the previous baseline in an English-language multi-source translation task. We publicly release the new multilingual data and code.

* 21st International Society for Music Information Retrieval Conference (ISMIR 2020) 

  Access Paper or Ask Questions

Pop Music Transformer: Generating Music with Rhythm and Harmony

Feb 01, 2020
Yu-Siang Huang, Yi-Hsuan Yang

The task automatic music composition entails generative modeling of music in symbolic formats such as the musical scores. By serializing a score as a sequence of MIDI-like events, recent work has demonstrated that state-of-the-art sequence models with self-attention work nicely for this task, especially for composing music with long-range coherence. In this paper, we show that sequence models can do even better when we improve the way a musical score is converted into events. The new event set, dubbed "REMI" (REvamped MIDI-derived events), provides sequence models a metric context for modeling the rhythmic patterns of music, while allowing for local tempo changes. Moreover, it explicitly sets up a harmonic structure and makes chord progression controllable. It also facilitates coordinating different tracks of a musical piece, such as the piano, bass and drums. With this new approach, we build a Pop Music Transformer that composes Pop piano music with a more plausible rhythmic structure than prior arts do. The code, data and pre-trained model are publicly available.\footnote{\url{}}

  Access Paper or Ask Questions