Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

S. M. Ali Eslami

PolyGen: An Autoregressive Generative Model of 3D Meshes

Feb 23, 2020

Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, Peter W. Battaglia

Figure 1 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 2 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 3 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Figure 4 for PolyGen: An Autoregressive Generative Model of 3D Meshes

Abstract:Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.

Via

Access Paper or Ask Questions

Unsupervised Doodling and Painting with Improved SPIRAL

Oct 02, 2019

John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, S. M. Ali Eslami

Figure 1 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 2 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 3 for Unsupervised Doodling and Painting with Improved SPIRAL

Figure 4 for Unsupervised Doodling and Painting with Improved SPIRAL

Abstract:We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvements to the architectures of the agents and discriminators that lead to intriguing and at times surprising results. We find that when sufficiently constrained, generative agents can learn to produce images with a degree of visual abstraction, despite having only ever seen real photographs (no human brush strokes). And given enough time with the painting environment, they can produce images with considerable realism. These results show that, under the right circumstances, some aspects of human drawing can emerge from simulated embodiment, without the need for external supervision, imitation or social cues. Finally, we note the framework's potential for use in creative applications.

* See https://learning-to-paint.github.io for an interactive version of this paper, with videos

Via

Access Paper or Ask Questions

A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

May 30, 2019

Simon A. A. Kohl, Bernardino Romera-Paredes, Klaus H. Maier-Hein, Danilo Jimenez Rezende, S. M. Ali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger

Figure 1 for A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

Figure 2 for A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

Figure 3 for A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

Figure 4 for A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

Abstract:Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the image level. In order to learn a flexible distribution that can account for multiple scales of variations, we propose the Hierarchical Probabilistic U-Net, a segmentation network with a conditional variational auto-encoder (cVAE) that uses a hierarchical latent space decomposition. We show that this model formulation enables sampling and reconstruction of segmenations with high fidelity, i.e. with finely resolved detail, while providing the flexibility to learn complex structured distributions across scales. We demonstrate these abilities on the task of segmenting ambiguous medical scans as well as on instance segmentation of neurobiological and natural images. Our model automatically separates independent factors across scales, an inductive bias that we deem beneficial in structured output prediction tasks beyond segmentation.

* 25 pages, 15 figures

Via

Access Paper or Ask Questions

Data-Efficient Image Recognition with Contrastive Predictive Coding

May 22, 2019

Olivier J. Hénaff, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

Figure 1 for Data-Efficient Image Recognition with Contrastive Predictive Coding

Figure 2 for Data-Efficient Image Recognition with Contrastive Predictive Coding

Figure 3 for Data-Efficient Image Recognition with Contrastive Predictive Coding

Figure 4 for Data-Efficient Image Recognition with Contrastive Predictive Coding

Abstract:Large scale deep learning excels when labeled images are abundant, yet data-efficient learning remains a longstanding challenge. While biological vision is thought to leverage vast amounts of unlabeled data to solve classification problems with limited supervision, computer vision has so far not succeeded in this `semi-supervised' regime. Our work tackles this challenge with Contrastive Predictive Coding, an unsupervised objective which extracts stable structure from still images. The result is a representation which, equipped with a simple linear classifier, separates ImageNet categories better than all competing methods, and surpasses the performance of a fully-supervised AlexNet model. When given a small number of labeled images (as few as 13 per class), this representation retains a strong classification performance, outperforming state-of-the-art semi-supervised methods by 10% Top-5 accuracy and supervised methods by 20%. Finally, we find our unsupervised representation to serve as a useful substrate for image detection on the PASCAL-VOC 2007 dataset, approaching the performance of representations trained with a fully annotated ImageNet dataset. We expect these results to open the door to pipelines that use scalable unsupervised representations as a drop-in replacement for supervised ones for real-world vision tasks where labels are scarce.

Via

Access Paper or Ask Questions

Meta-Learning surrogate models for sequential decision making

Mar 28, 2019

Alexandre Galashov, Jonathan Schwarz, Hyunjik Kim, Marta Garnelo, David Saxton, Pushmeet Kohli, S. M. Ali Eslami, Yee Whye Teh

Figure 1 for Meta-Learning surrogate models for sequential decision making

Figure 2 for Meta-Learning surrogate models for sequential decision making

Figure 3 for Meta-Learning surrogate models for sequential decision making

Figure 4 for Meta-Learning surrogate models for sequential decision making

Abstract:Meta-learning methods leverage past experience to learn data-driven inductive biases from related problems, increasing learning efficiency on new tasks. This ability renders them particularly suitable for sequential decision making with limited experience. Within this problem family, we argue for the use of such approaches in the study of model-based approaches to Bayesian Optimisation, contextual bandits and Reinforcement Learning. We approach the problem by learning distributions over functions using Neural Processes (NPs), a recently introduced probabilistic meta-learning method. This allows the treatment of model uncertainty to tackle the exploration/exploitation dilemma. We show that NPs are suitable for sequential decision making on a diverse set of domains, including adversarial task search, recommender systems and model-based reinforcement learning.

Via

Access Paper or Ask Questions

A Probabilistic U-Net for Segmentation of Ambiguous Images

Oct 29, 2018

Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger

Figure 1 for A Probabilistic U-Net for Segmentation of Ambiguous Images

Figure 2 for A Probabilistic U-Net for Segmentation of Ambiguous Images

Figure 3 for A Probabilistic U-Net for Segmentation of Ambiguous Images

Figure 4 for A Probabilistic U-Net for Segmentation of Ambiguous Images

Abstract:Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities.

* 11 pages for the main paper, 28 pages including appendix. 5 figures in the main paper, 18 figures in total, Advances in Neural Information Processing Systems (NIPS), 2018

Via

Access Paper or Ask Questions

Consistent Jumpy Predictions for Videos and Scenes

Oct 02, 2018

Ananya Kumar, S. M. Ali Eslami, Danilo J. Rezende, Marta Garnelo, Fabio Viola, Edward Lockhart, Murray Shanahan

Figure 1 for Consistent Jumpy Predictions for Videos and Scenes

Figure 2 for Consistent Jumpy Predictions for Videos and Scenes

Figure 3 for Consistent Jumpy Predictions for Videos and Scenes

Figure 4 for Consistent Jumpy Predictions for Videos and Scenes

Abstract:Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of frames that can then be used to simultaneously and efficiently sample temporally consistent frames at arbitrary time-points. For example, our model can "jump" and directly sample frames at the end of the video, without sampling intermediate frames. Synthetic video evaluations confirm substantial gains in speed and functionality without loss in fidelity. We also apply our framework to a 3D scene reconstruction dataset. Here, our model is conditioned on camera location and can sample consistent sets of images for what an occluded region of a 3D scene might look like, even if there are multiple possibilities for what that region might contain. Reconstructions and videos are available at https://bit.ly/2O4Pc4R.

Via

Access Paper or Ask Questions

Generative Temporal Models with Spatial Memory for Partially Observed Environments

Jul 19, 2018

Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola

Figure 1 for Generative Temporal Models with Spatial Memory for Partially Observed Environments

Figure 2 for Generative Temporal Models with Spatial Memory for Partially Observed Environments

Figure 3 for Generative Temporal Models with Spatial Memory for Partially Observed Environments

Figure 4 for Generative Temporal Models with Spatial Memory for Partially Observed Environments

Abstract:In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.

* ICML 2018

Via

Access Paper or Ask Questions

Encoding Spatial Relations from Natural Language

Jul 05, 2018

Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

Figure 1 for Encoding Spatial Relations from Natural Language

Figure 2 for Encoding Spatial Relations from Natural Language

Figure 3 for Encoding Spatial Relations from Natural Language

Figure 4 for Encoding Spatial Relations from Natural Language

Abstract:Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes. We present a system capable of capturing the semantics of spatial relations such as behind, left of, etc from natural language. Our key contributions are a novel multi-modal objective based on generating images of scenes from their textual descriptions, and a new dataset on which to train it. We demonstrate that internal representations are robust to meaning preserving transformations of descriptions (paraphrase invariance), while viewpoint invariance is an emergent property of the system.

Via

Access Paper or Ask Questions

Learning models for visual 3D localization with implicit mapping

Jul 04, 2018

Dan Rosenbaum, Frederic Besse, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami

Figure 1 for Learning models for visual 3D localization with implicit mapping

Figure 2 for Learning models for visual 3D localization with implicit mapping

Figure 3 for Learning models for visual 3D localization with implicit mapping

Figure 4 for Learning models for visual 3D localization with implicit mapping

Abstract:We propose a formulation of visual localization that does not require construction of explicit maps in the form of point clouds or voxels. The goal is to learn an implicit representation of the environment at a higher, more abstract level, for instance that of objects. To study this approach we consider procedurally generated Minecraft worlds, for which we can generate visually rich images along with camera pose coordinates. We first show that Generative Query Networks (GQNs) enhanced with a novel attention mechanism can capture the visual structure of 3D scenes in Minecraft, as evidenced by their samples. We then apply the models to the localization problem, investigating both generative and discriminative approaches, and compare the different ways in which they each capture task uncertainty. Our results show that models with implicit mapping are able to capture the underlying 3D structure of visually complex scenes, and use this to accurately localize new observations, paving the way towards future applications in sequential localization. Supplementary video available at https://youtu.be/iHEXX5wXbCI.

Via

Access Paper or Ask Questions