Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephen Roberts

MLRG Deep Curvature

Dec 20, 2019

Diego Granziol, Xingchen Wan, Timur Garipov, Dmitry Vetrov, Stephen Roberts

Abstract:We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape. Despite of providing rich information into properties of neural network and useful for a various designed tasks, curvature information is still not made sufficient use for various reasons, and our method aims to bridge this gap. We present a primer, including its main practical desiderata and common misconceptions, of \textit{Lanczos algorithm}, the theoretical backbone of our package, and present a series of examples based on synthetic toy examples and realistic modern neural networks tested on CIFAR datasets, and show the superiority of our package against existing competing approaches for the similar purposes.

* 11 pages, 11 figures

Via

Access Paper or Ask Questions

A Maximum Entropy approach to Massive Graph Spectra

Dec 19, 2019

Diego Granziol, Robin Ru, Stefan Zohren, Xiaowen Dong, Michael Osborne, Stephen Roberts

Figure 1 for A Maximum Entropy approach to Massive Graph Spectra

Figure 2 for A Maximum Entropy approach to Massive Graph Spectra

Figure 3 for A Maximum Entropy approach to Massive Graph Spectra

Figure 4 for A Maximum Entropy approach to Massive Graph Spectra

Abstract:Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing. The choice of kernel function and bandwidth are typically chosen in an ad-hoc manner and heavily affect the resulting output. We prove that kernel smoothing biases the moments of the spectral density. We propose an information theoretically optimal approach to learn a smooth graph spectral density, which fully respects the moment information. Our method's computational cost is linear in the number of edges, and hence can be applied to large networks, with millions of nodes. We apply our method to the problems to graph similarity and cluster number learning, where we outperform comparable iterative spectral approaches on synthetic and real graphs.

* 12 pages. 9 Figures

Via

Access Paper or Ask Questions

Indian Buffet Neural Networks for Continual Learning

Dec 04, 2019

Samuel Kessler, Vu Nguyen, Stefan Zohren, Stephen Roberts

Figure 1 for Indian Buffet Neural Networks for Continual Learning

Figure 2 for Indian Buffet Neural Networks for Continual Learning

Figure 3 for Indian Buffet Neural Networks for Continual Learning

Figure 4 for Indian Buffet Neural Networks for Continual Learning

Abstract:We place an Indian Buffet Process (IBP) prior over the neural structure of a Bayesian Neural Network (BNN), thus allowing the complexity of the BNN to increase and decrease automatically. We apply this methodology to the problem of resource allocation in continual learning, where new tasks occur and the network requires extra resources. Our BNN exploits online variational inference with relaxations to the Bernoulli and Beta distributions (which constitute the IBP prior), so allowing the use of the reparameterisation trick to learn variational posteriors via gradient-based methods. As we automatically learn the number of weights in the BNN, overfitting and underfitting problems are largely overcome. We show empirically that the method offers competitive results compared to Variational Continual Learning (VCL) in some settings.

* Camera-ready submission

Via

Access Paper or Ask Questions

Deep Reinforcement Learning for Trading

Nov 22, 2019

Zihao Zhang, Stefan Zohren, Stephen Roberts

Figure 1 for Deep Reinforcement Learning for Trading

Figure 2 for Deep Reinforcement Learning for Trading

Figure 3 for Deep Reinforcement Learning for Trading

Figure 4 for Deep Reinforcement Learning for Trading

Abstract:We adopt Deep Reinforcement Learning algorithms to design trading strategies for continuous futures contracts. Both discrete and continuous action spaces are considered and volatility scaling is incorporated to create reward functions which scale trade positions based on market volatility. We test our algorithms on the 50 most liquid futures contracts from 2011 to 2019, and investigate how performance varies across different asset classes including commodities, equity indices, fixed income and FX markets. We compare our algorithms against classical time series momentum strategies, and show that our method outperforms such baseline models, delivering positive profits despite heavy transaction costs. The experiments show that the proposed algorithms can follow large market trends without changing positions and can also scale down, or hold, through consolidation periods.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Regularising Deep Networks with Deep Generative Models

Oct 11, 2019

Matthew Willetts, Alexander Camuto, Stephen Roberts, Chris Holmes

Figure 1 for Regularising Deep Networks with Deep Generative Models

Figure 2 for Regularising Deep Networks with Deep Generative Models

Abstract:We develop a new method for regularising neural networks. We learn a probability distribution over the activations of all layers of the model and then insert imputed values into the network during training. We obtain a posterior for an arbitrary subset of activations conditioned on the remainder. This is a generalisation of data augmentation to the hidden layers of a network, and a form of data-aware dropout. We demonstrate that our training method leads to higher test accuracy and lower test-set cross-entropy for neural networks trained on CIFAR-10 and SVHN compared to standard regularisation baselines: our approach leads to networks with better calibrated uncertainty over the class posteriors all the while delivering greater test-set accuracy.

* 8 pages plus appendix

Via

Access Paper or Ask Questions

Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

Sep 25, 2019

Matthew Willetts, Stephen Roberts, Chris Holmes

Figure 1 for Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

Figure 2 for Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

Figure 3 for Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

Abstract:In clustering we normally output one cluster variable for each datapoint. However it is not necessarily the case that there is only one way to partition a given dataset into cluster components. For example, one could cluster objects by their colour, or by their type. Different attributes form a hierarchy, and we could wish to cluster in any of them. By disentangling the learnt latent representations of some dataset into different layers for different attributes we can then cluster in those latent spaces. We call this "disentangled clustering". Extending Variational Ladder Autoencoders (Zhao et al., 2017), we propose a clustering algorithm, VLAC, that outperforms a Gaussian Mixture DGM in cluster accuracy over digit identity on the test set of SVHN. We also demonstrate learning clusters jointly over numerous layers of the hierarchy of latent variables for the data, and show component-wise generation from this hierarchical model.

Via

Access Paper or Ask Questions

Balancing Reconstruction Quality and Regularisation in ELBO for VAEs

Sep 09, 2019

Shuyu Lin, Stephen Roberts, Niki Trigoni, Ronald Clark

Figure 1 for Balancing Reconstruction Quality and Regularisation in ELBO for VAEs

Figure 2 for Balancing Reconstruction Quality and Regularisation in ELBO for VAEs

Figure 3 for Balancing Reconstruction Quality and Regularisation in ELBO for VAEs

Figure 4 for Balancing Reconstruction Quality and Regularisation in ELBO for VAEs

Abstract:A trade-off exists between reconstruction quality and the prior regularisation in the Evidence Lower Bound (ELBO) loss that Variational Autoencoder (VAE) models use for learning. There are few satisfactory approaches to deal with a balance between the prior and reconstruction objective, with most methods dealing with this problem through heuristics. In this paper, we show that the noise variance (often set as a fixed value) in the Gaussian likelihood p(x|z) for real-valued data can naturally act to provide such a balance. By learning this noise variance so as to maximise the ELBO loss, we automatically obtain an optimal trade-off between the reconstruction error and the prior constraint on the posteriors. This variance can be interpreted intuitively as the necessary noise level for the current model to be the best explanation of the observed dataset. Further, by allowing the variance inference to be more flexible it can conveniently be used as an uncertainty estimator for reconstructed or generated samples. We demonstrate that optimising the noise variance is a crucial component of VAE learning, and showcase the performance on MNIST, Fashion MNIST and CelebA datasets. We find our approach can significantly improve the quality of generated samples whilst maintaining a smooth latent-space manifold to represent the data. The method also offers an indication of uncertainty in the final generative model.

* 8 pages for main contents and 15 pages for supplemental materials that include data pre-processing, model architectures and more results

Via

Access Paper or Ask Questions

MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Jun 03, 2019

Diego Granziol, Binxin Ru, Stefan Zohren, Xiaowen Doing, Michael Osborne, Stephen Roberts

Figure 1 for MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Figure 2 for MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Figure 3 for MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Figure 4 for MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Abstract:Efficient approximation lies at the heart of large-scale machine learning problems. In this paper, we propose a novel, robust maximum entropy algorithm, which is capable of dealing with hundreds of moments and allows for computationally efficient approximations. We showcase the usefulness of the proposed method, its equivalence to constrained Bayesian variational inference and demonstrate its superiority over existing approaches in two applications, namely, fast log determinant estimation and information-theoretic Bayesian optimisation.

* MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning. Entropy, 21(6), 551 (2019)
* 18 pages, 3 figures, Published at Entropy 2019: Special Issue Entropy Based Inference and Optimization in Machine Learning

Via

Access Paper or Ask Questions

Disentangling Improves VAEs' Robustness to Adversarial Attacks

Jun 01, 2019

Matthew Willetts, Alexander Camuto, Stephen Roberts, Chris Holmes

Figure 1 for Disentangling Improves VAEs' Robustness to Adversarial Attacks

Figure 2 for Disentangling Improves VAEs' Robustness to Adversarial Attacks

Figure 3 for Disentangling Improves VAEs' Robustness to Adversarial Attacks

Figure 4 for Disentangling Improves VAEs' Robustness to Adversarial Attacks

Abstract:This paper is concerned with the robustness of VAEs to adversarial attacks. We highlight that conventional VAEs are brittle under attack but that methods recently introduced for disentanglement such as $\beta$-TCVAE (Chen et al., 2018) improve robustness, as demonstrated through a variety of previously proposed adversarial attacks (Tabacof et al. (2016); Gondim-Ribeiro et al. (2018); Kos et al.(2018)). This motivated us to develop Seatbelt-VAE, a new hierarchical disentangled VAE that is designed to be significantly more robust to adversarial attacks than existing approaches, while retaining high quality reconstructions.

Via

Access Paper or Ask Questions

Robustness Quantification for Classification with Gaussian Processes

May 28, 2019

Arno Blaas, Luca Laurenti, Andrea Patane, Luca Cardelli, Marta Kwiatkowska, Stephen Roberts

Figure 1 for Robustness Quantification for Classification with Gaussian Processes

Figure 2 for Robustness Quantification for Classification with Gaussian Processes

Figure 3 for Robustness Quantification for Classification with Gaussian Processes

Abstract:We consider Bayesian classification with Gaussian processes (GPs) and define robustness of a classifier in terms of the worst-case difference in the classification probabilities with respect to input perturbations. For a subset of the input space $T\subseteq \mathbb{R}^m$ such properties reduce to computing the infimum and supremum of the classification probabilities for all points in $T$. Unfortunately, computing the above values is very challenging, as the classification probabilities cannot be expressed analytically. Nevertheless, using the theory of Gaussian processes, we develop a framework that, for a given dataset $\mathcal{D}$, a compact set of input points $T\subseteq \mathbb{R}^m$ and an error threshold $\epsilon>0$, computes lower and upper bounds of the classification probabilities by over-approximating the exact range with an error bounded by $\epsilon$. We provide experimental comparison of several approximate inference methods for classification on tasks associated to MNIST and SPAM datasets showing that our results enable quantification of uncertainty in adversarial classification settings.

* 10 pages, 3 figures + Appendix

Via

Access Paper or Ask Questions