Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arthur Szlam

Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

Nov 22, 2018
Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus

Figure 1 for Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

Figure 2 for Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

Figure 3 for Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

Figure 4 for Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies. We propose an unsupervised learning scheme, based on asymmetric self-play from Sukhbaatar et al. (2018), that automatically learns a good representation of sub-goals in the environment and a low-level policy that can execute them. A high-level policy can then direct the lower one by generating a sequence of continuous sub-goal vectors. We evaluate our model using Mazebase and Mujoco environments, including the challenging AntGather task. Visualizations of the sub-goal embeddings reveal a logical decomposition of tasks within the environment. Quantitatively, our approach obtains compelling performance gains over non-hierarchical approaches.

Via

Access Paper or Ask Questions

Dialogue Natural Language Inference

Nov 01, 2018
Sean Welleck, Jason Weston, Arthur Szlam, Kyunghyun Cho

Figure 1 for Dialogue Natural Language Inference

Figure 2 for Dialogue Natural Language Inference

Figure 3 for Dialogue Natural Language Inference

Figure 4 for Dialogue Natural Language Inference

Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with human evaluation and with automatic metrics on a suite of evaluation sets designed to measure a dialogue model's consistency.

Via

Access Paper or Ask Questions

Lightweight Adaptive Mixture of Neural and N-gram Language Models

Oct 26, 2018
Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

Figure 1 for Lightweight Adaptive Mixture of Neural and N-gram Language Models

Figure 2 for Lightweight Adaptive Mixture of Neural and N-gram Language Models

Figure 3 for Lightweight Adaptive Mixture of Neural and N-gram Language Models

Figure 4 for Lightweight Adaptive Mixture of Neural and N-gram Language Models

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams. In this work, we propose a method to improve how these two models are combined. By using a small network which predicts the mixture weight between the two models, we adapt their relative importance at each time step. Because the gating network is small, it trains quickly on small amounts of held out data, and does not add overhead at scoring time. Our experiments carried out on the One Billion Word benchmark show a significant improvement over the state of the art ensemble without retraining of the basic modules.

Via

Access Paper or Ask Questions

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Sep 25, 2018
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 2 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 3 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Figure 4 for Personalizing Dialogue Agents: I have a dog, do you have pets too?

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Via

Access Paper or Ask Questions

Planning with Arithmetic and Geometric Attributes

Sep 06, 2018
David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

Figure 1 for Planning with Arithmetic and Geometric Attributes

Figure 2 for Planning with Arithmetic and Geometric Attributes

Figure 3 for Planning with Arithmetic and Geometric Attributes

Figure 4 for Planning with Arithmetic and Geometric Attributes

A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones. If the environment has geometric or arithmetic structure, the agent should exploit these for faster generalization. Building on recent work that augments the environment with user-specified attributes, we show that further equipping these attributes with the appropriate geometric and arithmetic structure brings substantial gains in sample complexity.

Via

Access Paper or Ask Questions

Low-shot learning with large-scale diffusion

Jun 15, 2018
Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

Figure 1 for Low-shot learning with large-scale diffusion

Figure 2 for Low-shot learning with large-scale diffusion

Figure 3 for Low-shot learning with large-scale diffusion

Figure 4 for Low-shot learning with large-scale diffusion

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime.

Via

Access Paper or Ask Questions

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Apr 27, 2018
Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

Figure 1 for Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Figure 2 for Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Figure 3 for Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Figure 4 for Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will "propose" the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

* Published in ICLR 2018

Via

Access Paper or Ask Questions

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Apr 16, 2018
Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 2 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 3 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 4 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities.

Via

Access Paper or Ask Questions

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Mar 23, 2018
Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

Figure 1 for Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Figure 2 for Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Figure 3 for Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Figure 4 for Modeling Others using Oneself in Multi-Agent Reinforcement Learning

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent's actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players' hidden states, in both cooperative and adversarial settings.

* 10 pages, 16 figures, submitted to ICML 2018

Via

Access Paper or Ask Questions

Composable Planning with Attributes

Mar 01, 2018
Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam

Figure 1 for Composable Planning with Attributes

Figure 2 for Composable Planning with Attributes

Figure 3 for Composable Planning with Attributes

Figure 4 for Composable Planning with Attributes

The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between "nearby" sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies.

Via

Access Paper or Ask Questions