Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Curtis Hawthorne

Encoding Musical Style with Transformer Autoencoders

Dec 10, 2019

Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

Figure 1 for Encoding Musical Style with Transformer Autoencoders

Figure 2 for Encoding Musical Style with Transformer Autoencoders

Figure 3 for Encoding Musical Style with Transformer Autoencoders

Figure 4 for Encoding Musical Style with Transformer Autoencoders

Abstract:We consider the problem of learning high-level controls over the global structure of sequence generation, particularly in the context of symbolic music generation with complex language models. In this work, we present the Transformer autoencoder, which aggregates encodings of the input data across time to obtain a global representation of style from a given performance. We show it is possible to combine this global embedding with other temporally distributed embeddings, enabling improved control over the separate aspects of performance style and and melody. Empirically, we demonstrate the effectiveness of our method on a variety of music generation tasks on the MAESTRO dataset and a YouTube dataset with 10,000+ hours of piano performances, where we achieve improvements in terms of log-likelihood and mean listening scores as compared to relevant baselines.

Via

Access Paper or Ask Questions

The Bach Doodle: Approachable music composition with machine learning at scale

Jul 14, 2019

Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft

Figure 1 for The Bach Doodle: Approachable music composition with machine learning at scale

Figure 2 for The Bach Doodle: Approachable music composition with machine learning at scale

Figure 3 for The Bach Doodle: Approachable music composition with machine learning at scale

Figure 4 for The Bach Doodle: Approachable music composition with machine learning at scale

Abstract:To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js (Smilkov et al., 2019) to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmonization request should be performed locally or sent to remote TPU servers. In three days, people spent 350 years worth of time playing with the Bach Doodle, and Coconet received more than 55 million queries. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing with this paper. We hope that the community finds this dataset useful for applications ranging from ethnomusicological studies, to music education, to improving machine learning models.

* Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2019

Via

Access Paper or Ask Questions

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Oct 30, 2018

Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

Figure 1 for Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Figure 2 for Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Figure 3 for Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Figure 4 for Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Abstract:Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.

* Examples available at https://goo.gl/magenta/maestro-examples

Via

Access Paper or Ask Questions

Music Transformer

Oct 10, 2018

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

Abstract:Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity is quadratic in the sequence length. We propose an algorithm that reduces the intermediate memory requirements to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long (thousands of steps) compositions with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-competition, and obtain state-of-the-art results on the latter.

* Rewrote many sections to clarify the work, and extended relative attention to the local case. Previous title is "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation"

Via

Access Paper or Ask Questions

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Jul 30, 2018

Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, Douglas Eck

Figure 1 for A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Figure 2 for A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Figure 3 for A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Figure 4 for A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

Abstract:The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the "posterior collapse" problem which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a "flat" baseline model. An implementation of our "MusicVAE" is available online at http://g.co/magenta/musicvae-code.

* ICML 2018
* ICML Camera Ready Version

Via

Access Paper or Ask Questions

Onsets and Frames: Dual-Objective Piano Transcription

Jun 05, 2018

Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

Figure 1 for Onsets and Frames: Dual-Objective Piano Transcription

Figure 2 for Onsets and Frames: Dual-Objective Piano Transcription

Figure 3 for Onsets and Frames: Dual-Objective Piano Transcription

Figure 4 for Onsets and Frames: Dual-Objective Piano Transcription

Abstract:We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames. Our model predicts pitch onset events and then uses those predictions to condition framewise pitch predictions. During inference, we restrict the predictions from the framewise detector by not allowing a new note to start unless the onset detector also agrees that an onset for that pitch is present in the frame. We focus on improving onsets and offsets together instead of either in isolation as we believe this correlates better with human musical perception. Our approach results in over a 100% relative improvement in note F1 score (with offsets) on the MAPS dataset. Furthermore, we extend the model to predict relative velocities of normalized audio which results in more natural-sounding transcriptions.

* Examples available at https://goo.gl/magenta/onsets-frames-examples

Via

Access Paper or Ask Questions

Learning a Latent Space of Multitrack Measures

Jun 01, 2018

Ian Simon, Adam Roberts, Colin Raffel, Jesse Engel, Curtis Hawthorne, Douglas Eck

Figure 1 for Learning a Latent Space of Multitrack Measures

Figure 2 for Learning a Latent Space of Multitrack Measures

Figure 3 for Learning a Latent Space of Multitrack Measures

Figure 4 for Learning a Latent Space of Multitrack Measures

Abstract:Discovering and exploring the underlying structure of multi-instrumental music using learning-based approaches remains an open problem. We extend the recent MusicVAE model to represent multitrack polyphonic measures as vectors in a latent space. Our approach enables several useful operations such as generating plausible measures from scratch, interpolating between measures in a musically meaningful way, and manipulating specific musical attributes. We also introduce chord conditioning, which allows all of these operations to be performed while keeping harmony fixed, and allows chords to be changed while maintaining musical "style". By generating a sequence of measures over a predefined chord progression, our model can produce music with convincing long-term structure. We demonstrate that our latent space model makes it possible to intuitively control and generate musical sequences with rich instrumentation (see https://goo.gl/s2N7dV for generated audio).

Via

Access Paper or Ask Questions