Alert button
Picture for Ivor Simpson

Ivor Simpson

Alert button

MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

Feb 11, 2022
Rui Guo, Ivor Simpson, Chris Kiefer, Thor Magnusson, Dorien Herremans

Figure 1 for MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control
Figure 2 for MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control
Figure 3 for MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control
Figure 4 for MusIAC: An extensible generative framework for Music Infilling Applications with multi-level Control

We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and track polyphony level in this work. We explore the effects of including several musically meaningful control tokens, and evaluate the results using objective metrics related to pitch and rhythm. Our results demonstrate that adding additional control tokens helps to generate music with stronger stylistic similarities to the original music. It also provides the user with more control to change properties like the music texture and tonal tension in each bar compared to previous research which only provided control for track density. We present the model in a Google Colab notebook to enable interactive generation.

* preprint for The 11th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) 2022 
Viaarxiv icon

The GAN that Warped: Semantic Attribute Editing with Unpaired Data

Nov 30, 2018
Garoe Dorta, Sara Vicente, Neill D. F. Campbell, Ivor Simpson

Figure 1 for The GAN that Warped: Semantic Attribute Editing with Unpaired Data
Figure 2 for The GAN that Warped: Semantic Attribute Editing with Unpaired Data
Figure 3 for The GAN that Warped: Semantic Attribute Editing with Unpaired Data
Figure 4 for The GAN that Warped: Semantic Attribute Editing with Unpaired Data

Deep neural networks have recently been used to edit images with great success. However, they are often limited by only being able to work at a restricted range of resolutions. They are also so flexible that semantic face edits can often result in an unwanted loss of identity. This work proposes a model that learns how to perform semantic image edits through the application of smooth warp fields. This warp field can be efficiently predicted at a reasonably low resolution and then resampled and applied at arbitrary resolutions. Previous approaches that attempted to use warping for semantic edits required paired data, that is example images of the same object with different semantic characteristics. In contrast, we employ recent advances in Generative Adversarial Networks that allow our model to be effectively trained with unpaired data. We demonstrate the efficacy of our method for editing face images at very high resolutions (4k images) with an efficient single forward pass of a deep network at a lower resolution. We illustrate how the extent of our edits can be trivially reduced or exaggerated by scaling the predicted warp field, and we also show that our edits are substantially better at maintaining the subject's identity.

Viaarxiv icon

Training VAEs Under Structured Residuals

Jul 31, 2018
Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson

Figure 1 for Training VAEs Under Structured Residuals
Figure 2 for Training VAEs Under Structured Residuals
Figure 3 for Training VAEs Under Structured Residuals
Figure 4 for Training VAEs Under Structured Residuals

Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.

* Simplified training methodology, added more results 
Viaarxiv icon

Structured Uncertainty Prediction Networks

Mar 23, 2018
Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson

Figure 1 for Structured Uncertainty Prediction Networks
Figure 2 for Structured Uncertainty Prediction Networks
Figure 3 for Structured Uncertainty Prediction Networks
Figure 4 for Structured Uncertainty Prediction Networks

This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation. We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.

* CVPR 2018 (final version) 
Viaarxiv icon