Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshua Bengio

Generalization in Deep Learning

Feb 22, 2018
Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio

Figure 1 for Generalization in Deep Learning

Figure 2 for Generalization in Deep Learning

Figure 3 for Generalization in Deep Learning

Figure 4 for Generalization in Deep Learning

With a direct analysis of neural networks, this paper presents a mathematically tight generalization theory to partially address an open problem regarding the generalization of deep learning. Unlike previous bound-based theory, our main theory is quantitatively as tight as possible for every dataset individually, while producing qualitative insights competitively. Our results give insight into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, answering to an open question in the literature. We also discuss limitations of our results and propose additional open problems.

* Extended version: all previous results remain unchanged and new theoretical results were added with improved presentation

Via

Access Paper or Ask Questions

FigureQA: An Annotated Figure Dataset for Visual Reasoning

Feb 22, 2018
Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio

Figure 1 for FigureQA: An Annotated Figure Dataset for Visual Reasoning

Figure 2 for FigureQA: An Annotated Figure Dataset for Visual Reasoning

Figure 3 for FigureQA: An Annotated Figure Dataset for Visual Reasoning

Figure 4 for FigureQA: An Annotated Figure Dataset for Visual Reasoning

We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are synthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plot elements and examine characteristics like the maximum, the minimum, area-under-the-curve, smoothness, and intersection. To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure. To facilitate the training of machine learning systems, the corpus also includes side data that can be used to formulate auxiliary objectives. In particular, we provide the numerical data used to generate each figure as well as bounding-box annotations for all plot elements. We study the proposed visual reasoning task by training several models, including the recently proposed Relation Network as a strong baseline. Preliminary results indicate that the task poses a significant machine learning challenge. We envision FigureQA as a first step towards developing models that can intuitively recognize patterns from visual representations of data.

* workshop paper at ICLR 2018

Via

Access Paper or Ask Questions

ChatPainter: Improving Text to Image Generation using Dialogue

Feb 22, 2018
Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, Yoshua Bengio

Figure 1 for ChatPainter: Improving Text to Image Generation using Dialogue

Figure 2 for ChatPainter: Improving Text to Image Generation using Dialogue

Figure 3 for ChatPainter: Improving Text to Image Generation using Dialogue

Figure 4 for ChatPainter: Improving Text to Image Generation using Dialogue

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the images correspond to which words in the captions. We show that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.

Via

Access Paper or Ask Questions

Boundary-Seeking Generative Adversarial Networks

Feb 21, 2018
R Devon Hjelm, Athul Paul Jacob, Tong Che, Adam Trischler, Kyunghyun Cho, Yoshua Bengio

Figure 1 for Boundary-Seeking Generative Adversarial Networks

Figure 2 for Boundary-Seeking Generative Adversarial Networks

Figure 3 for Boundary-Seeking Generative Adversarial Networks

Figure 4 for Boundary-Seeking Generative Adversarial Networks

Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.

Via

Access Paper or Ask Questions

Graph Attention Networks

Feb 04, 2018
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

* To appear at ICLR 2018. 12 pages, 2 figures

Via

Access Paper or Ask Questions

A Deep Reinforcement Learning Chatbot (Short Version)

Jan 20, 2018
Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Figure 1 for A Deep Reinforcement Learning Chatbot (Short Version)

Figure 2 for A Deep Reinforcement Learning Chatbot (Short Version)

Figure 3 for A Deep Reinforcement Learning Chatbot (Short Version)

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

* 9 pages, 1 figure, 2 tables; presented at NIPS 2017, Conversational AI: "Today's Practice and Tomorrow's Potential" Workshop

Via

Access Paper or Ask Questions

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Jan 16, 2018
Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau

Figure 1 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Figure 2 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Figure 3 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Figure 4 for Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.

* Proceedings of the 55th annual meeting on Association for Computational Linguistics (2017), pp. 1116-1126
* ACL 2017

Via

Access Paper or Ask Questions

A3T: Adversarially Augmented Adversarial Training

Jan 12, 2018
Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

Figure 1 for A3T: Adversarially Augmented Adversarial Training

Figure 2 for A3T: Adversarially Augmented Adversarial Training

Figure 3 for A3T: Adversarially Augmented Adversarial Training

Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of deep neural networks through enforcing representation invariance. The idea is to train the classifier jointly with a discriminator attached to one of its hidden layer and trained to filter the adversarial noise. We perform preliminary experiments to test the viability of the approach and to compare it to other standard adversarial training methods.

* accepted for an oral presentation in Machine Deception Workshop, NIPS 2017

Via

Access Paper or Ask Questions

Efficient EM Training of Gaussian Mixtures with Missing Data

Jan 08, 2018
Olivier Delalleau, Aaron Courville, Yoshua Bengio

Figure 1 for Efficient EM Training of Gaussian Mixtures with Missing Data

Figure 2 for Efficient EM Training of Gaussian Mixtures with Missing Data

Figure 3 for Efficient EM Training of Gaussian Mixtures with Missing Data

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore in this paper is the use of a generative model (a mixture of Gaussians) to compute the conditional expectation of the missing variables given the observed variables. Since training a Gaussian mixture with many different patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that significantly speeds up training in these conditions. We also observe that good results can be obtained by using the generative model to fill-in the missing values for a separate discriminant learning algorithm.

Via

Access Paper or Ask Questions

Dendritic error backpropagation in deep cortical microcircuits

Dec 30, 2017
João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Figure 1 for Dendritic error backpropagation in deep cortical microcircuits

Figure 2 for Dendritic error backpropagation in deep cortical microcircuits

Figure 3 for Dendritic error backpropagation in deep cortical microcircuits

Figure 4 for Dendritic error backpropagation in deep cortical microcircuits

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates the necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this model synaptic learning is driven by a local dendritic prediction error that arises from a failure to predict the top-down input given the bottom-up activities. Such errors occur at apical dendrites of pyramidal neurons where both long-range excitatory feedback and local inhibitory predictions are integrated. When local inhibition fails to match excitatory feedback an error occurs which triggers plasticity at bottom-up synapses at basal dendrites of the same pyramidal neurons. We demonstrate the learning capabilities of the model in a number of tasks and show that it approximates the classical error backpropagation algorithm. Finally, complementing this cortical circuit with a disinhibitory mechanism enables attention-like stimulus denoising and generation. Our framework makes several experimental predictions on the function of dendritic integration and cortical microcircuits, is consistent with recent observations of cross-area learning, and suggests a biological implementation of deep learning.

* 27 pages, 5 figures, 10 pages supplementary information

Via

Access Paper or Ask Questions