Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrea Vedaldi

Interpretable Explanations of Black Boxes by Meaningful Perturbation

Jan 10, 2018

Ruth Fong, Andrea Vedaldi

Figure 1 for Interpretable Explanations of Black Boxes by Meaningful Perturbation

Figure 2 for Interpretable Explanations of Black Boxes by Meaningful Perturbation

Figure 3 for Interpretable Explanations of Black Boxes by Meaningful Perturbation

Figure 4 for Interpretable Explanations of Black Boxes by Meaningful Perturbation

Abstract:As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks "look" in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.

* Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV)
* Final camera-ready paper published at ICCV 2017 (Supplementary materials: http://openaccess.thecvf.com/content_ICCV_2017/supplemental/Fong_Interpretable_Explanations_of_ICCV_2017_supplemental.pdf)

Via

Access Paper or Ask Questions

Taking Visual Motion Prediction To New Heightfields

Dec 22, 2017

Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi

Figure 1 for Taking Visual Motion Prediction To New Heightfields

Figure 2 for Taking Visual Motion Prediction To New Heightfields

Figure 3 for Taking Visual Motion Prediction To New Heightfields

Figure 4 for Taking Visual Motion Prediction To New Heightfields

Abstract:While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and estimating the associated parameters. In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted the relevant states, and then used neural networks to learn the state transitions using simulation runs as training data. Unfortunately, such approaches are unsuited for modeling complex real-world scenarios, where manually authoring relevant state spaces tend to be tedious and challenging. In this work, we investigate if neural networks can implicitly learn physical states of real-world mechanical processes only based on visual data while internally modeling non-homogeneous environment and in the process enable long-term physical extrapolation. We develop a recurrent neural network architecture for this task and also characterize resultant uncertainties in the form of evolving variance estimates. We evaluate our setup to extrapolate motion of rolling ball(s) on bowls of varying shape and orientation, and on arbitrary heightfields using only images as input. We report significant improvements over existing image-based methods both in terms of accuracy of predictions and complexity of scenarios; and report competitive performance with approaches that, unlike us, assume access to internal physical states.

* arXiv admin note: text overlap with arXiv:1706.02179

Via

Access Paper or Ask Questions

DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

Dec 02, 2017

Jameson Merkow, Robert Lufkin, Kim Nguyen, Stefano Soatto, Zhuowen Tu, Andrea Vedaldi

Figure 1 for DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

Figure 2 for DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

Figure 3 for DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

Figure 4 for DeepRadiologyNet: Radiologist Level Pathology Detection in CT Head Images

Abstract:We describe a system to automatically filter clinically significant findings from computerized tomography (CT) head scans, operating at performance levels exceeding that of practicing radiologists. Our system, named DeepRadiologyNet, builds on top of deep convolutional neural networks (CNNs) trained using approximately 3.5 million CT head images gathered from over 24,000 studies taken from January 1, 2015 to August 31, 2015 and January 1, 2016 to April 30 2016 in over 80 clinical sites. For our initial system, we identified 30 phenomenological traits to be recognized in the CT scans. To test the system, we designed a clinical trial using over 4.8 million CT head images (29,925 studies), completely disjoint from the training and validation set, interpreted by 35 US Board Certified radiologists with specialized CT head experience. We measured clinically significant error rates to ascertain whether the performance of DeepRadiologyNet was comparable to or better than that of US Board Certified radiologists. DeepRadiologyNet achieved a clinically significant miss rate of 0.0367% on automatically selected high-confidence studies. Thus, DeepRadiologyNet enables significant reduction in the workload of human radiologists by automatically filtering studies and reporting on the high-confidence ones at an operating point well below the literal error rate for US Board Certified radiologists, estimated at 0.82%.

* 22 pages with references, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Learning multiple visual domains with residual adapters

Nov 27, 2017

Sylvestre-Alvise Rebuffi, Hakan Bilen, Andrea Vedaldi

Figure 1 for Learning multiple visual domains with residual adapters

Figure 2 for Learning multiple visual domains with residual adapters

Figure 3 for Learning multiple visual domains with residual adapters

Abstract:There is a growing interest in learning data representations that work well for many different types of problems and data. In this paper, we look in particular at the task of learning a single visual representation that can be successfully utilized in the analysis of very different types of images, from dog breeds to stop signs and digits. Inspired by recent work on learning networks that predict the parameters of another, we develop a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains. Our method achieves a high degree of parameter sharing while maintaining or even improving the accuracy of domain-specific representations. We also introduce the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very different visual domains and measures their ability to recognize well uniformly.

Via

Access Paper or Ask Questions

Unsupervised learning of object frames by dense equivariant image labelling

Nov 18, 2017

James Thewlis, Hakan Bilen, Andrea Vedaldi

Figure 1 for Unsupervised learning of object frames by dense equivariant image labelling

Figure 2 for Unsupervised learning of object frames by dense equivariant image labelling

Figure 3 for Unsupervised learning of object frames by dense equivariant image labelling

Figure 4 for Unsupervised learning of object frames by dense equivariant image labelling

Abstract:One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame. This coordinate frame is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. We demonstrate the applicability of this method to simple articulated objects and deformable objects such as human faces, learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision.

* NIPS 2017

Via

Access Paper or Ask Questions

It Takes (Only) Two: Adversarial Generator-Encoder Networks

Nov 06, 2017

Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

Figure 1 for It Takes (Only) Two: Adversarial Generator-Encoder Networks

Figure 2 for It Takes (Only) Two: Adversarial Generator-Encoder Networks

Figure 3 for It Takes (Only) Two: Adversarial Generator-Encoder Networks

Figure 4 for It Takes (Only) Two: Adversarial Generator-Encoder Networks

Abstract:We present a new autoencoder-type architecture that is trainable in an unsupervised mode, sustains both generation and inference, and has the quality of conditional and unconditional samples boosted by adversarial learning. Unlike previous hybrids of autoencoders and adversarial networks, the adversarial game in our approach is set up directly between the encoder and the generator, and no external mappings are trained in the process of learning. The game objective compares the divergences of each of the real and the generated data distributions with the prior distribution in the latent space. We show that direct generator-vs-encoder game leads to a tight coupling of the two components, resulting in samples and reconstructions of a comparable quality to some recently-proposed more complex architectures.

Via

Access Paper or Ask Questions

Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

Nov 06, 2017

Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

Figure 1 for Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

Figure 2 for Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

Figure 3 for Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

Figure 4 for Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis

Abstract:The recent work of Gatys et al., who characterized the style of an image by the statistics of convolutional neural network filters, ignited a renewed interest in the texture generation and image stylization problems. While their image generation technique uses a slow optimization process, recently several authors have proposed to learn generator neural networks that can produce similar outputs in one quick forward pass. While generator networks are promising, they are still inferior in visual quality and diversity compared to generation-by-optimization. In this work, we advance them in two significant ways. First, we introduce an instance normalization module to replace batch normalization with significant improvements to the quality of image stylization. Second, we improve diversity by introducing a new learning formulation that encourages generators to sample unbiasedly from the Julesz texture ensemble, which is the equivalence class of all images characterized by certain filter responses. Together, these two improvements take feed forward texture synthesis and image stylization much closer to the quality of generation-via-optimization, while retaining the speed advantage.

Via

Access Paper or Ask Questions

Instance Normalization: The Missing Ingredient for Fast Stylization

Nov 06, 2017

Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

Figure 1 for Instance Normalization: The Missing Ingredient for Fast Stylization

Figure 2 for Instance Normalization: The Missing Ingredient for Fast Stylization

Figure 3 for Instance Normalization: The Missing Ingredient for Fast Stylization

Figure 4 for Instance Normalization: The Missing Ingredient for Fast Stylization

Abstract:It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at arXiv:1701.02096.

Via

Access Paper or Ask Questions

Learning 3D Object Categories by Looking Around Them

Aug 24, 2017

David Novotny, Diane Larlus, Andrea Vedaldi

Figure 1 for Learning 3D Object Categories by Looking Around Them

Figure 2 for Learning 3D Object Categories by Looking Around Them

Figure 3 for Learning 3D Object Categories by Looking Around Them

Figure 4 for Learning 3D Object Categories by Looking Around Them

Abstract:Traditional approaches for learning 3D object categories use either synthetic data or manual supervision. In this paper, we propose a method which does not require manual annotations and is instead cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly comparing 3D shapes; and a 3D shape completion network that can extract the full shape of an object from partial observations. We also demonstrate the benefits of configuring networks to perform probabilistic predictions as well as of geometry-aware data augmentation schemes. We obtain state-of-the-art results on publicly-available benchmarks.

* Proceedings of the International Conference on Computer Vision, 2017

Via

Access Paper or Ask Questions

Action Recognition with Dynamic Image Networks

Aug 19, 2017

Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi

Figure 1 for Action Recognition with Dynamic Image Networks

Figure 2 for Action Recognition with Dynamic Image Networks

Figure 3 for Action Recognition with Dynamic Image Networks

Figure 4 for Action Recognition with Dynamic Image Networks

Abstract:We introduce the concept of "dynamic image", a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of `rank pooling'. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the parameters of the latter as a representation. When a linear ranking machine is used, the resulting representation is in the form of an image, which we call dynamic because it summarizes the video dynamics in addition of appearance. This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos. We also present an efficient and effective approximate rank pooling operator, accelerating standard rank pooling algorithms by orders of magnitude, and formulate that as a CNN layer. This new layer allows generalizing dynamic images to dynamic feature maps. We demonstrate the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance.

* 14 pages, 9 figures, 9 tables

Via

Access Paper or Ask Questions