Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jimmy Ba

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Jul 06, 2020

Yuhuai Wu, Albert Jiang, Jimmy Ba, Roger Grosse

Figure 1 for INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Figure 2 for INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Figure 3 for INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Figure 4 for INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

Abstract:In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time. In this paper, we introduce INT, an INequality Theorem proving benchmark, specifically designed to test agents' generalization ability. INT is based on a procedure for generating theorems and proofs; this procedure's knobs allow us to measure 6 different types of generalization, each reflecting a distinct challenge characteristic to automated theorem proving. In addition, unlike prior benchmarks for learning-assisted theorem proving, INT provides a lightweight and user-friendly theorem proving environment with fast simulations, conducive to performing learning-based and search-based research. We introduce learning-based baselines and evaluate them across 6 dimensions of generalization with the benchmark. We then evaluate the same agents augmented with Monte Carlo Tree Search (MCTS) at test time, and show that MCTS can help to prove new theorems.

Via

Access Paper or Ask Questions

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Jul 06, 2020

Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba

Figure 1 for Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Figure 2 for Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Figure 3 for Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Figure 4 for Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Abstract:What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.

* 12 pages (+12 appendix). Published as a conference paper at ICML 2020. Code available at https://github.com/spitis/mrl

Via

Access Paper or Ask Questions

When Does Preconditioning Help or Hurt Generalization?

Jul 02, 2020

Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Figure 1 for When Does Preconditioning Help or Hurt Generalization?

Figure 2 for When Does Preconditioning Help or Hurt Generalization?

Figure 3 for When Does Preconditioning Help or Hurt Generalization?

Figure 4 for When Does Preconditioning Help or Hurt Generalization?

Abstract:While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization remains controversial. For instance, it has been pointed out that gradient descent (GD), in contrast to many preconditioned updates, converges to small Euclidean norm solutions in overparameterized models, leading to favorable generalization properties. This work presents a more nuanced view on the comparison of generalization between first- and second-order methods. We provide an asymptotic bias-variance decomposition of the generalization error of overparameterized ridgeless regression under a general class of preconditioner $\boldsymbol{P}$, and consider the inverse population Fisher information matrix (used in NGD) as a particular example. We determine the optimal $\boldsymbol{P}$ for both the bias and variance, and find that the relative generalization performance of different optimizers depends on the label noise and the "shape" of the signal (true parameters): when the labels are noisy, the model is misspecified, or the signal is misaligned with the features, NGD can achieve lower risk; conversely, GD generalizes better than NGD under clean labels, a well-specified model, or aligned signal. Based on this analysis, we discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD. We then extend our analysis to regression in the reproducing kernel Hilbert space and demonstrate that preconditioned GD can decrease the population risk faster than GD. Lastly, we empirically compare the generalization performance of first- and second-order optimizers in neural network experiments, and observe robust trends matching our theoretical analysis.

* 38 pages

Via

Access Paper or Ask Questions

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Feb 20, 2020

Yeming Wen, Dustin Tran, Jimmy Ba

Figure 1 for BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Figure 2 for BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Figure 3 for BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Figure 4 for BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

Abstract:Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.

* Eighth International Conference on Learning Representations (ICLR 2020)

Via

Access Paper or Ask Questions

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Feb 14, 2020

Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

Figure 1 for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Figure 2 for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Figure 3 for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Figure 4 for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Abstract:Distances are pervasive in machine learning. They serve as similarity measures, loss functions, and learning targets; it is said that a good distance measure solves a task. When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias. Deep metric learning architectures that respect the triangle inequality rely, almost exclusively, on Euclidean distance in the latent space. Though effective, this fails to model two broad classes of subadditive distances, common in graphs and reinforcement learning: asymmetric metrics, and metrics that cannot be embedded into Euclidean space. To address these problems, we introduce novel architectures that are guaranteed to satisfy the triangle inequality. We prove our architectures universally approximate norm-induced metrics on $\mathbb{R}^n$, and present a similar result for modified Input Convex Neural Networks. We show that our architectures outperform existing metric approaches when modeling graph distances and have a better inductive bias than non-metric approaches when training data is limited in the multi-goal reinforcement learning setting.

* 11 pages (+18 appendix). Published as a conference paper at ICLR 2020. https://openreview.net/forum?id=HJeiDpVFPr

Via

Access Paper or Ask Questions

Dream to Control: Learning Behaviors by Latent Imagination

Dec 03, 2019

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

Figure 1 for Dream to Control: Learning Behaviors by Latent Imagination

Figure 2 for Dream to Control: Learning Behaviors by Latent Imagination

Figure 3 for Dream to Control: Learning Behaviors by Latent Imagination

Figure 4 for Dream to Control: Learning Behaviors by Latent Imagination

Abstract:Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance.

* 9 pages, 12 figures

Via

Access Paper or Ask Questions

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Nov 25, 2019

Yuanhao Wang, Guodong Zhang, Jimmy Ba

Figure 1 for On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Figure 2 for On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Figure 3 for On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Figure 4 for On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Abstract:Many tasks in modern machine learning can be formulated as finding equilibria in \emph{sequential} games. In particular, two-player zero-sum sequential games, also known as minimax optimization, have received growing interest. It is tempting to apply gradient descent to solve minimax optimization given its popularity and success in supervised learning. However, it has been noted that naive application of gradient descent fails to find some local minimax and can converge to non-local-minimax points. In this paper, we propose \emph{Follow-the-Ridge} (FR), a novel algorithm that provably converges to and only converges to local minimax. We show theoretically that the algorithm addresses the notorious rotational behaviour of gradient dynamics, and is compatible with preconditioning and \emph{positive} momentum. Empirically, FR solves toy minimax problems and improves the convergence of GAN training compared to the recent minimax optimization algorithms.

* 21 pages

Via

Access Paper or Ask Questions

Lookahead Optimizer: k steps forward, 1 step back

Jul 19, 2019

Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba

Figure 1 for Lookahead Optimizer: k steps forward, 1 step back

Figure 2 for Lookahead Optimizer: k steps forward, 1 step back

Figure 3 for Lookahead Optimizer: k steps forward, 1 step back

Figure 4 for Lookahead Optimizer: k steps forward, 1 step back

Abstract:The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by \emph{looking ahead} at the sequence of "fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

* 8 pages

Via

Access Paper or Ask Questions

Benchmarking Model-Based Reinforcement Learning

Jul 03, 2019

Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Figure 1 for Benchmarking Model-Based Reinforcement Learning

Figure 2 for Benchmarking Model-Based Reinforcement Learning

Figure 3 for Benchmarking Model-Based Reinforcement Learning

Figure 4 for Benchmarking Model-Based Reinforcement Learning

Abstract:Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. It is fairly common for authors to experiment with self-designed environments, and there are several separate lines of research, which are sometimes closed-sourced or not reproducible. Accordingly, it is an open question how these various existing MBRL algorithms perform relative to each other. To facilitate research in MBRL, in this paper we gather a wide collection of MBRL algorithms and propose over 18 benchmarking environments specially designed for MBRL. We benchmark these algorithms with unified problem settings, including noisy environments. Beyond cataloguing performance, we explore and unify the underlying algorithmic differences across MBRL algorithms. We characterize three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma. Finally, to maximally facilitate future research on MBRL, we open-source our benchmark in http://www.cs.toronto.edu/~tingwuwang/mbrl.html.

* 8 main pages, 8 figures; 14 appendix pages, 25 figures

Via

Access Paper or Ask Questions

Exploring Model-based Planning with Policy Networks

Jun 20, 2019

Tingwu Wang, Jimmy Ba

Figure 1 for Exploring Model-based Planning with Policy Networks

Figure 2 for Exploring Model-based Planning with Policy Networks

Figure 3 for Exploring Model-based Planning with Policy Networks

Figure 4 for Exploring Model-based Planning with Policy Networks

Abstract:Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in terms of both sample efficiency and asymptotic performance. Despite their initial successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that POPLIN obtains state-of-the-art performance in the MuJoCo benchmarking environments, being about 3x more sample efficient than the state-of-the-art algorithms, such as PETS, TD3 and SAC. To explain the effectiveness of our algorithm, we show that the optimization surface in parameter space is smoother than in action space. Further more, we found the distilled policy network can be effectively applied without the expansive model predictive control during test time for some environments such as Cheetah. Code is released in https://github.com/WilsonWangTHU/POPLIN.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions