Advancements in the digital technologies have enabled researchers to develop a variety of Computational Music applications. Such applications are required to capture, process, and generate data related to music. Therefore, it is important to digitally represent music in a music theoretic and concise manner. Existing approaches for representing music are ineffective in terms of utilizing music theory. In this paper, we address the disjoint of music theory and computational music by developing an opensource representation tool based on music theory. Through the wide range of use cases, we run an analysis on the classical music pieces to show the usefulness of the developed music embedding.
Raga is a central musical concept in South Asia, especially India, and we investigate connections between Western classical music and Melakarta raga that is a raga in Karnatak (south Indian) classical music, through musical icosahedron. In our previous study, we introduced some kinds of musical icosahedra connecting various musical concepts in Western music: chromatic/whole tone musical icosahedra, Pythagorean/whole tone musical icosahedra, and exceptional musical icosahedra. In this paper, first, we introduce kinds of musical icosahedra that connect the above musical icosahedra through two kinds of permutations of 12 tones: inter-permutations and intra-permutations, and we call them intermediate musical icosahedra. Next, we define a neighboring number as a number of pairs of neighboring two tones in a given scale that neighbor each other on a given musical icosahedron, and we also define a musical invariant as a linear combination of the neighboring numbers. We find there exists a pair of a musical invariant and scales that is constant for some musical icosahedra and analyze their mathematical structure. Last, we define an extension of a given scale by the inter-permutations of a given musical icosahedron: the permutation-extension. Then, we show that the permutation-extension of the C major scale by Melakarta raga musical icosahedra that are four of the intermediate musical icosahedra from the type 1 chromatic/whole tone musical icosahedron to the type 1' Pythagorean/whole tone musical icosahedron, is a set of all the scales included in Melakarta raga. There exists a musical invariant that is constant for all the musical icosahedra corresponding to the scales of Melakarta raga, and we obtained a diagram representation of those scales characterizing the musical invariant.
In this work, we address the task of video background music generation. Some previous works achieve effective music generation but are unable to generate melodious music tailored to a particular video, and none of them considers the video-music rhythmic consistency. To generate the background music that matches the given video, we first establish the rhythmic relations between video and background music. In particular, we connect timing, motion speed, and motion saliency from video with beat, simu-note density, and simu-note strength from music, respectively. We then propose CMT, a Controllable Music Transformer that enables local control of the aforementioned rhythmic features and global control of the music genre and instruments. Objective and subjective evaluations show that the generated background music has achieved satisfactory compatibility with the input videos, and at the same time, impressive music quality. Code and models are available at https://github.com/wzk1015/video-bgm-generation.
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the demo video with audio turned on to experience the results.
Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a general-purpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.
Music transcription is the process of transcribing music audio into music notation. It is a field in which the machines still cannot beat human performance. The main motivation for automatic music transcription is to make it possible for anyone playing a musical instrument, to be able to generate the music notes for a piece of music quickly and accurately. It does not matter if the person is a beginner and simply struggles to find the music score by searching, or an expert who heard a live jazz improvisation and would like to reproduce it without losing time doing manual transcription. We propose Scorpiano -- a system that can automatically generate a music score for simple monophonic piano melody tracks using digital signal processing. The system integrates multiple digital audio processing methods: notes onset detection, tempo estimation, beat detection, pitch detection and finally generation of the music score. The system has proven to give good results for simple piano melodies, comparable to commercially available neural network based systems.
We propose a new approach to analyses of musical pieces by using the exceptional musical icosahedra where all the major/minor triads are represented by golden triangles and golden gnomons. First, we introduce a concept of the golden neighborhood that characterizes golden triangles/gnomons that neighbor a given golden triangle or gnomon. Then, we investigate a relation between the exceptional musical icosahedra and the neo-Riemannian theory, and find that the golden neighborhoods and the icosahedron symmetry relate any major/minor triad with any major/minor triad. Second, we show how the exceptional musical icosahedra are applied to analyzing harmonies constructed by four or more tones. We introduce two concepts, golden decomposition and golden singular. The golden decomposition is a decomposition of a given harmony into some harmonies constructing the given harmony and represented the golden figure (a golden triangle, a golden gnomon, or a golden rectangle). A harmony is golden singular if and only if the harmony does not have golden decompositions. We show results of the golden analysis (analysis by the golden decomposition) of the tertian seventh chords and the mystic chords. While the dominant seventh chord is golden singular in the type 1[star] and the type 4[star] exceptional musical icosahedron, the half-diminished seventh chord is golden singular in the type 2 [star] and the type 3[star] exceptional musical icosahedron. Last, we apply the golden analysis to the famous prelude in C major by Johan Sebastian Bach (BWV 846). We found 7 combinations of the golden figures on the type 2 [star] or the type 3 [star] exceptional musical icosahedron dually represent all the measures of the BWV 846.
Annotating music items with music genres is crucial for music recommendation and information retrieval, yet challenging given that music genres are subjective concepts. Recently, in order to explicitly consider this subjectivity, the annotation of music items was modeled as a translation task: predict for a music item its music genres within a target vocabulary or taxonomy (tag system) from a set of music genre tags originating from other tag systems. However, without a parallel corpus, previous solutions could not handle tag systems in other languages, being limited to the English-language only. Here, by learning multilingual music genre embeddings, we enable cross-lingual music genre translation without relying on a parallel corpus. First, we apply compositionality functions on pre-trained word embeddings to represent multi-word tags.Second, we adapt the tag representations to the music domain by leveraging multilingual music genres graphs with a modified retrofitting algorithm. Experiments show that our method: 1) is effective in translating music genres across tag systems in multiple languages (English, French and Spanish); 2) outperforms the previous baseline in an English-language multi-source translation task. We publicly release the new multilingual data and code.
The task automatic music composition entails generative modeling of music in symbolic formats such as the musical scores. By serializing a score as a sequence of MIDI-like events, recent work has demonstrated that state-of-the-art sequence models with self-attention work nicely for this task, especially for composing music with long-range coherence. In this paper, we show that sequence models can do even better when we improve the way a musical score is converted into events. The new event set, dubbed "REMI" (REvamped MIDI-derived events), provides sequence models a metric context for modeling the rhythmic patterns of music, while allowing for local tempo changes. Moreover, it explicitly sets up a harmonic structure and makes chord progression controllable. It also facilitates coordinating different tracks of a musical piece, such as the piano, bass and drums. With this new approach, we build a Pop Music Transformer that composes Pop piano music with a more plausible rhythmic structure than prior arts do. The code, data and pre-trained model are publicly available.\footnote{\url{https://github.com/YatingMusic/remi}}