Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Philip H. S. Torr

University of Oxford

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Jul 02, 2020

Yuge Shi, Brooks Paige, Philip H. S. Torr, N. Siddharth

Figure 1 for Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Figure 2 for Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Figure 3 for Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Figure 4 for Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Abstract:Multimodal learning for generative models often refers to the learning of abstract concepts from the commonality of information in multiple modalities, such as vision and language. While it has proven effective for learning generalisable representations, the training of such models often requires a large amount of "related" multimodal data that shares commonality, which can be expensive to come by. To mitigate this, we develop a novel contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. We show in experiments that our method enables data-efficient multimodal learning on challenging datasets for various multimodal VAE models. We also show that under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.

Via

Access Paper or Ask Questions

STEER: Simple Temporal Regularization For Neural ODEs

Jul 01, 2020

Arnab Ghosh, Harkirat Singh Behl, Philip H. S. Torr, Vinay Namboodiri

Figure 1 for STEER: Simple Temporal Regularization For Neural ODEs

Figure 2 for STEER: Simple Temporal Regularization For Neural ODEs

Figure 3 for STEER: Simple Temporal Regularization For Neural ODEs

Figure 4 for STEER: Simple Temporal Regularization For Neural ODEs

Abstract:Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.

Via

Access Paper or Ask Questions

Rethinking Semi-Supervised Learning in VAEs

Jun 17, 2020

Tom Joy, Sebastian M. Schmon, Philip H. S. Torr, N. Siddharth, Tom Rainforth

Figure 1 for Rethinking Semi-Supervised Learning in VAEs

Figure 2 for Rethinking Semi-Supervised Learning in VAEs

Figure 3 for Rethinking Semi-Supervised Learning in VAEs

Figure 4 for Rethinking Semi-Supervised Learning in VAEs

Abstract:We present an alternative approach to semi-supervision in variational autoencoders(VAEs) that incorporates labels through auxiliary variables rather than directly through the latent variables. Prior work has generally conflated the meaning of labels, i.e. the associated characteristics of interest, with the actual label values themselves-learning latent variables that directly correspond to the label values. We argue that to learn meaningful representations, semi-supervision should instead try to capture these richer characteristics and that the construction of latent variables as label values is not just unnecessary, but actively harmful. To this end, we develop a novel VAE model, the reparameterized VAE (ReVAE), which "reparameterizes" supervision through auxiliary variables and a concomitant variational objective. Through judicious structuring of mappings between latent and auxiliary variables, we show that the ReVAE can effectively learn meaningful representations of data. In particular, we demonstrate that the ReVAE is able to match, and even improve on the classification accuracy of previous approaches, but more importantly, it also allows for more effective and more general interventions to be performed. We include a demo of ReVAE at https://github.com/thwjoy/revae-demo.

Via

Access Paper or Ask Questions

A Revised Generative Evaluation of Visual Dialogue

Apr 24, 2020

Daniela Massiceti, Viveka Kulharia, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

Figure 1 for A Revised Generative Evaluation of Visual Dialogue

Figure 2 for A Revised Generative Evaluation of Visual Dialogue

Figure 3 for A Revised Generative Evaluation of Visual Dialogue

Figure 4 for A Revised Generative Evaluation of Visual Dialogue

Abstract:Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge. The current evaluation scheme of the VisDial dataset computes the ranks of ground-truth answers in predefined candidate sets, which Massiceti et al. (2018) show can be susceptible to the exploitation of dataset biases. This scheme also does little to account for the different ways of expressing the same answer--an aspect of language that has been well studied in NLP. We propose a revised evaluation scheme for the VisDial dataset leveraging metrics from the NLP literature to measure consensus between answers generated by the model and a set of relevant answers. We construct these relevant answer sets using a simple and effective semi-supervised method based on correlation, which allows us to automatically extend and scale sparse relevance annotations from humans to the entire dataset. We release these sets and code for the revised evaluation scheme as DenseVisDial, and intend them to be an improvement to the dataset in the face of its existing constraints and design choices.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Mar 31, 2020

Hao Tang, Xiaojuan Qi, Dan Xu, Philip H. S. Torr, Nicu Sebe

Figure 1 for Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Figure 2 for Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Figure 3 for Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Figure 4 for Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Abstract:We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for photo-realistic image synthesis from semantic layouts. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to two largely unresolved challenges. First, the semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. Second, the widely adopted CNN operations such as convolution, down-sampling and normalization usually cause spatial resolution loss and thus are unable to fully preserve the original semantic information, leading to semantically inconsistent results (e.g., missing small objects). To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. Further, to preserve the semantic information, we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout. Extensive experiments on two challenging datasets show that the proposed EdgeGAN can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Ha0Tang/EdgeGAN.

* 40 pages, 29 figures

Via

Access Paper or Ask Questions

Cross-modal Deep Face Normals with Deactivable Skip Connections

Mar 30, 2020

Victoria Fernandez Abrevaya, Adnane Boukhayma, Philip H. S. Torr, Edmond Boyer

Figure 1 for Cross-modal Deep Face Normals with Deactivable Skip Connections

Figure 2 for Cross-modal Deep Face Normals with Deactivable Skip Connections

Figure 3 for Cross-modal Deep Face Normals with Deactivable Skip Connections

Figure 4 for Cross-modal Deep Face Normals with Deactivable Skip Connections

Abstract:We present an approach for estimating surface normals from in-the-wild color images of faces. While data-driven strategies have been proposed for single face images, limited available ground truth data makes this problem difficult. To alleviate this issue, we propose a method that can leverage all available image and normal data, whether paired or not, thanks to a novel cross-modal learning architecture. In particular, we enable additional training with single modality data, either color or normal, by using two encoder-decoder networks with a shared latent space. The proposed architecture also enables face details to be transferred between the image and normal domains, given paired data, through skip connections between the image encoder and normal decoder. Core to our approach is a novel module that we call deactivable skip connections, which allows integrating both the auto-encoded and image-to-normal branches within the same architecture that can be trained end-to-end. This allows learning of a rich latent space that can accurately capture the normal information. We compare against state-of-the-art methods and show that our approach can achieve significant improvements, both quantitative and qualitative, with natural face images.

* CVPR 2020

Via

Access Paper or Ask Questions

Data Parallelism in Training Sparse Neural Networks

Mar 25, 2020

Namhoon Lee, Philip H. S. Torr, Martin Jaggi

Figure 1 for Data Parallelism in Training Sparse Neural Networks

Figure 2 for Data Parallelism in Training Sparse Neural Networks

Figure 3 for Data Parallelism in Training Sparse Neural Networks

Figure 4 for Data Parallelism in Training Sparse Neural Networks

Abstract:Network pruning is an effective methodology to compress large neural networks, and sparse neural networks obtained by pruning can benefit from their reduced memory and computational costs at use. Notably, recent advances have found that it is possible to find a trainable sparse neural network even at random initialization prior to training; hence the obtained sparse network only needs to be trained. While this approach of pruning at initialization turned out to be highly effective, little has been studied about the training aspects of these sparse neural networks. In this work, we focus on measuring the effects of data parallelism on training sparse neural networks. As a result, we find that the data parallelism in training sparse neural networks is no worse than that in training densely parameterized neural networks, despite the general difficulty of training sparse neural networks. When training sparse networks using SGD with momentum, the breakdown of the perfect scaling regime occurs even much later than the dense at large batch sizes.

* ICLR 2020 workshop on PML4DC: Learning under limited/low resource scenarios

Via

Access Paper or Ask Questions

Holistically-Attracted Wireframe Parsing

Mar 03, 2020

Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

Figure 1 for Holistically-Attracted Wireframe Parsing

Figure 2 for Holistically-Attracted Wireframe Parsing

Figure 3 for Holistically-Attracted Wireframe Parsing

Figure 4 for Holistically-Attracted Wireframe Parsing

Abstract:This paper presents a fast and parsimonious parsing method to accurately and robustly detect a vectorized wireframe in an input image with a single forward pass. The proposed method is end-to-end trainable, consisting of three components: (i) line segment and junction proposal generation, (ii) line segment and junction matching, and (iii) line segment and junction verification. For computing line segment proposals, a novel exact dual representation is proposed which exploits a parsimonious geometric reparameterization for line segments and forms a holistic 4-dimensional attraction field map for an input image. Junctions can be treated as the "basins" in the attraction field. The proposed method is thus called Holistically-Attracted Wireframe Parser (HAWP). In experiments, the proposed method is tested on two benchmarks, the Wireframe dataset, and the YorkUrban dataset. On both benchmarks, it obtains state-of-the-art performance in terms of accuracy and efficiency. For example, on the Wireframe dataset, compared to the previous state-of-the-art method L-CNN, it improves the challenging mean structural average precision (msAP) by a large margin ($2.8\%$ absolute improvements) and achieves 29.5 FPS on single GPU ($89\%$ relative improvement). A systematic ablation study is performed to further justify the proposed method.

* Accepted by CVPR 2020

Via

Access Paper or Ask Questions

Lagrangian Decomposition for Neural Network Verification

Feb 24, 2020

Rudy Bunel, Alessandro De Palma, Alban Desmaison, Krishnamurthy Dvijotham, Pushmeet Kohli, Philip H. S. Torr, M. Pawan Kumar

Figure 1 for Lagrangian Decomposition for Neural Network Verification

Figure 2 for Lagrangian Decomposition for Neural Network Verification

Figure 3 for Lagrangian Decomposition for Neural Network Verification

Figure 4 for Lagrangian Decomposition for Neural Network Verification

Abstract:A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used off-the-shelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds. Empirically, we show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speed-up when employing the bounds for formal verification.

Via

Access Paper or Ask Questions

Calibrating Deep Neural Networks using Focal Loss

Feb 21, 2020

Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip H. S. Torr, Puneet K. Dokania

Figure 1 for Calibrating Deep Neural Networks using Focal Loss

Figure 2 for Calibrating Deep Neural Networks using Focal Loss

Figure 3 for Calibrating Deep Neural Networks using Focal Loss

Figure 4 for Calibrating Deep Neural Networks using Focal Loss

Abstract:Miscalibration -- a mismatch between a model's confidence and its correctness -- of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases.

Via

Access Paper or Ask Questions