Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arthur Szlam

Recurrent Orthogonal Networks and Long-Memory Tasks

Mar 15, 2017

Mikael Henaff, Arthur Szlam, Yann LeCun

Figure 1 for Recurrent Orthogonal Networks and Long-Memory Tasks

Figure 2 for Recurrent Orthogonal Networks and Long-Memory Tasks

Figure 3 for Recurrent Orthogonal Networks and Long-Memory Tasks

Figure 4 for Recurrent Orthogonal Networks and Long-Memory Tasks

Abstract:Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illuminate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions furthermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.

Via

Access Paper or Ask Questions

Automatic Rule Extraction from Long Short Term Memory Networks

Feb 24, 2017

W. James Murdoch, Arthur Szlam

Figure 1 for Automatic Rule Extraction from Long Short Term Memory Networks

Figure 2 for Automatic Rule Extraction from Long Short Term Memory Networks

Figure 3 for Automatic Rule Extraction from Long Short Term Memory Networks

Figure 4 for Automatic Rule Extraction from Long Short Term Memory Networks

Abstract:Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tracking the importance of a given input to the LSTM for a given output. By identifying consistently important patterns of words, we are able to distill state of the art LSTMs on sentiment analysis and question answering into a set of representative phrases. This representation is then quantitatively validated by using the extracted phrases to construct a simple, rule-based classifier which approximates the output of the LSTM.

* ICLR 2017 accepted paper

Via

Access Paper or Ask Questions

Training Language Models Using Target-Propagation

Feb 15, 2017

Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

Figure 1 for Training Language Models Using Target-Propagation

Figure 2 for Training Language Models Using Target-Propagation

Figure 3 for Training Language Models Using Target-Propagation

Figure 4 for Training Language Models Using Target-Propagation

Abstract:While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experiments suggest that TPROP generally underperforms BPTT, and we end with an analysis of this phenomenon, and suggestions for future work.

Via

Access Paper or Ask Questions

Learning Multiagent Communication with Backpropagation

Oct 31, 2016

Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

Figure 1 for Learning Multiagent Communication with Backpropagation

Figure 2 for Learning Multiagent Communication with Backpropagation

Figure 3 for Learning Multiagent Communication with Backpropagation

Figure 4 for Learning Multiagent Communication with Backpropagation

Abstract:Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand.

* Accepted to NIPS 2016

Via

Access Paper or Ask Questions

Video (language) modeling: a baseline for generative models of natural videos

May 04, 2016

MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra

Figure 1 for Video (language) modeling: a baseline for generative models of natural videos

Figure 2 for Video (language) modeling: a baseline for generative models of natural videos

Figure 3 for Video (language) modeling: a baseline for generative models of natural videos

Figure 4 for Video (language) modeling: a baseline for generative models of natural videos

Abstract:We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences.

Via

Access Paper or Ask Questions

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Apr 19, 2016

Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston

Figure 1 for Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Figure 2 for Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Figure 3 for Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Figure 4 for Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Abstract:A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is hard to assess. A contrasting recent proposal are the bAbI tasks (Weston et al., 2015b) which are synthetic data that measure the ability of learning machines at various reasoning tasks over toy language. Unfortunately, those tests are very small and hence may encourage methods that do not scale. In this work, we propose a suite of new tasks of a much larger scale that attempt to bridge the gap between the two regimes. Choosing the domain of movies, we provide tasks that test the ability of models to answer factual questions (utilizing OMDB), provide personalization (utilizing MovieLens), carry short conversations about the two, and finally to perform on natural dialogs from Reddit. We provide a dataset covering 75k movie entities and with 3.5M training examples. We present results of various models on these tasks, and evaluate their performance.

Via

Access Paper or Ask Questions

Convolutional networks and learning invariant to homogeneous multiplicative scalings

Feb 16, 2016

Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

Figure 1 for Convolutional networks and learning invariant to homogeneous multiplicative scalings

Figure 2 for Convolutional networks and learning invariant to homogeneous multiplicative scalings

Figure 3 for Convolutional networks and learning invariant to homogeneous multiplicative scalings

Figure 4 for Convolutional networks and learning invariant to homogeneous multiplicative scalings

Abstract:The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. "Scale-invariant" means that multiplying the input values by any nonzero scalar leaves the output unchanged.

* Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017
* 12 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

MazeBase: A Sandbox for Learning from Games

Jan 07, 2016

Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

Figure 1 for MazeBase: A Sandbox for Learning from Games

Figure 2 for MazeBase: A Sandbox for Learning from Games

Figure 3 for MazeBase: A Sandbox for Learning from Games

Figure 4 for MazeBase: A Sandbox for Learning from Games

Abstract:This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these games, with and without a procedurally generated curriculum. Despite the tasks' simplicity, the performance of the models is far from optimal, suggesting directions for future development. We also demonstrate the versatility of MazeBase by using it to emulate small combat scenarios from StarCraft. Models trained on the MazeBase version can be directly applied to StarCraft, where they consistently beat the in-game AI.

Via

Access Paper or Ask Questions

Simple Baseline for Visual Question Answering

Dec 15, 2015

Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

Figure 1 for Simple Baseline for Visual Question Answering

Figure 2 for Simple Baseline for Visual Question Answering

Figure 3 for Simple Baseline for Visual Question Answering

Figure 4 for Simple Baseline for Visual Question Answering

Abstract:We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

* One comparison method's scores are put into the correct column, and a new experiment of generating attention map is added

Via

Access Paper or Ask Questions

A mathematical motivation for complex-valued convolutional networks

Dec 12, 2015

Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert

Abstract:A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as "data-driven multiscale windowed power spectra," "data-driven multiscale windowed absolute spectra," "data-driven multiwavelet absolute values," or (in their most general configuration) "data-driven nonlinear multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

* Neural Computation, 28 (5): 815-825, May 2016
* 11 pages, 3 figures; this is the retitled version submitted to the journal, "Neural Computation"

Via

Access Paper or Ask Questions