Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Uesato

Tony

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Dec 04, 2018

Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderman, Keith Anderson, Krishmamurthy, Dvijotham, Nicolas Heess, Pushmeet Kohli

Figure 1 for Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Figure 2 for Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Figure 3 for Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Figure 4 for Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

Abstract:This paper addresses the problem of evaluating learning systems in safety critical domains such as autonomous driving, where failures can have catastrophic consequences. We focus on two problems: searching for scenarios when learned agents fail and assessing their probability of failure. The standard method for agent evaluation in reinforcement learning, Vanilla Monte Carlo, can miss failures entirely, leading to the deployment of unsafe agents. We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation. To address this shortcoming, we draw upon the rare event probability estimation literature and propose an adversarial evaluation approach. Our approach focuses evaluation on adversarially chosen situations, while still providing unbiased estimates of failure probabilities. The key difficulty is in identifying these adversarial situations -- since failures are rare there is little signal to drive optimization. To solve this we propose a continuation approach that learns failure modes in related but less robust agents. Our approach also allows reuse of data already collected for training the agent. We demonstrate the efficacy of adversarial evaluation on two standard domains: humanoid control and simulated driving. Experimental results show that our methods can find catastrophic failures and estimate failures rates of agents multiple orders of magnitude faster than standard evaluation schemes, in minutes to hours rather than days.

Via

Access Paper or Ask Questions

Robustness via curvature regularization, and vice versa

Nov 23, 2018

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, Pascal Frossard

Figure 1 for Robustness via curvature regularization, and vice versa

Figure 2 for Robustness via curvature regularization, and vice versa

Figure 3 for Robustness via curvature regularization, and vice versa

Figure 4 for Robustness via curvature regularization, and vice versa

Abstract:State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the most effective strategies to improve robustness is adversarial training. In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. We show in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more "linear" behaviour of the network. Using a locally quadratic approximation, we provide theoretical evidence on the existence of a strong relation between large robustness and small curvature. To further show the importance of reduced curvature for improving the robustness, we propose a new regularizer that directly minimizes curvature of the loss surface, and leads to adversarial robustness that is on par with adversarial training. Besides being a more efficient and principled alternative to adversarial training, the proposed regularizer confirms our claims on the importance of exhibiting quasi-linear behavior in the vicinity of data points in order to achieve robustness.

Via

Access Paper or Ask Questions

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

Nov 22, 2018

Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

Figure 1 for Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

Figure 2 for Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

Figure 3 for Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

Figure 4 for Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

Abstract:While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. This makes these models particularly unsuitable for safety-critical application domains (e.g. self-driving cars) where robustness is extremely important. Recent work has shown that augmenting training with adversarially generated data provides some degree of robustness against test-time attacks. In this paper we investigate how this approach scales as we increase the computational budget given to the defender. We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model. Crucially, we show that it is the adversarial training of the ensemble, rather than the ensembling of adversarially trained models, which provides robustness.

* 12 pages

Via

Access Paper or Ask Questions

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Nov 05, 2018

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, Pushmeet Kohli

Figure 1 for On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Figure 2 for On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Figure 3 for On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Figure 4 for On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

Abstract:Recent works have shown that it is possible to train models that are verifiably robust to norm-bounded adversarial perturbations. While these recent methods show promise, they remain hard to scale and difficult to tune. This paper investigates how interval bound propagation (IBP) using simple interval arithmetic can be exploited to train verifiably robust neural networks that are surprisingly effective. While IBP itself has been studied in prior work, our contribution is in showing that, with an appropriate loss and careful tuning of hyper-parameters, verified training with IBP leads to a fast and stable learning algorithm. We compare our approach with recent techniques, and train classifiers that improve on the state-of-the-art in single-model adversarial robustness: we reduce the verified error rate from 3.67% to 2.23% on MNIST (with $\ell_\infty$ perturbations of $\epsilon = 0.1$), from 19.32% to 8.05% on MNIST (at $\epsilon = 0.3$), and from 78.22% to 72.91% on CIFAR-10 (at $\epsilon = 8/255$).

Via

Access Paper or Ask Questions

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

Jun 27, 2018

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy(+16 more)

Abstract:CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. This technical report is structured as follows. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software. Section 2 presents the core functionalities of the library: namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks. Section 3 describes how to report benchmark results using the library. Section 4 describes the versioning system.

* Technical report for https://github.com/tensorflow/cleverhans

Via

Access Paper or Ask Questions

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Jun 12, 2018

Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli

Figure 1 for Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Figure 2 for Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Figure 3 for Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Figure 4 for Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Abstract:This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as 'obscurity to an adversary,' and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.

Via

Access Paper or Ask Questions

Training verified learners with learned verifiers

May 29, 2018

Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

Figure 1 for Training verified learners with learned verifiers

Figure 2 for Training verified learners with learned verifiers

Figure 3 for Training verified learners with learned verifiers

Figure 4 for Training verified learners with learned verifiers

Abstract:This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train two networks: a predictor network that performs the task at hand,e.g., predicting labels given inputs, and a verifier network that computes a bound on how well the predictor satisfies the properties being verified. Both networks can be trained simultaneously to optimize a weighted combination of the standard data-fitting loss and a term that bounds the maximum violation of the property. Experiments show that not only is the predictor-verifier architecture able to train networks to achieve state of the art verified robustness to adversarial examples with much shorter training times (outperforming previous algorithms on small datasets like MNIST and SVHN), but it can also be scaled to produce the first known (to the best of our knowledge) verifiably robust networks for CIFAR-10.

Via

Access Paper or Ask Questions

Semantic Code Repair using Neuro-Symbolic Transformation Networks

Oct 30, 2017

Jacob Devlin, Jonathan Uesato, Rishabh Singh, Pushmeet Kohli

Figure 1 for Semantic Code Repair using Neuro-Symbolic Transformation Networks

Figure 2 for Semantic Code Repair using Neuro-Symbolic Transformation Networks

Figure 3 for Semantic Code Repair using Neuro-Symbolic Transformation Networks

Figure 4 for Semantic Code Repair using Neuro-Symbolic Transformation Networks

Abstract:We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code. The majority of past work in semantic code repair assumed access to unit tests against which candidate repairs could be validated. In contrast, the goal here is to develop a strong statistical model to accurately predict both bug locations and exact fixes without access to information about the intended correct behavior of the program. Achieving such a goal requires a robust contextual repair model, which we train on a large corpus of real-world source code that has been augmented with synthetically injected bugs. Our framework adopts a two-stage approach where first a large set of repair candidates are generated by rule-based processors, and then these candidates are scored by a statistical model using a novel neural network architecture which we refer to as Share, Specialize, and Compete. Specifically, the architecture (1) generates a shared encoding of the source code using an RNN over the abstract syntax tree, (2) scores each candidate repair using specialized network modules, and (3) then normalizes these scores together so they can compete against one another in comparable probability space. We evaluate our model on a real-world test set gathered from GitHub containing four common categories of bugs. Our model is able to predict the exact correct repair 41\% of the time with a single guess, compared to 13\% accuracy for an attentional sequence-to-sequence model.

Via

Access Paper or Ask Questions

RobustFill: Neural Program Learning under Noisy I/O

Mar 21, 2017

Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

Figure 1 for RobustFill: Neural Program Learning under Noisy I/O

Figure 2 for RobustFill: Neural Program Learning under Noisy I/O

Figure 3 for RobustFill: Neural Program Learning under Noisy I/O

Figure 4 for RobustFill: Neural Program Learning under Noisy I/O

Abstract:The problem of automatically generating a computer program from some specification has been studied since the early days of AI. Recently, two competing approaches for automatic program learning have received significant attention: (1) neural program synthesis, where a neural network is conditioned on input/output (I/O) examples and learns to generate a program, and (2) neural program induction, where a neural network generates new outputs directly using a latent program representation. Here, for the first time, we directly compare both approaches on a large-scale, real-world learning task. We additionally contrast to rule-based program synthesis, which uses hand-crafted semantics to guide the program generation. Our neural models use a modified attention RNN to allow encoding of variable-sized sets of I/O pairs. Our best synthesis model achieves 92% accuracy on a real-world test set, compared to the 34% accuracy of the previous best neural synthesis approach. The synthesis model also outperforms a comparable induction model on this task, but we more importantly demonstrate that the strength of each approach is highly dependent on the evaluation metric and end-user application. Finally, we show that we can train our neural models to remain very robust to the type of noise expected in real-world data (e.g., typos), while a highly-engineered rule-based system fails entirely.

* 8 pages + 9 pages of supplementary material

Via

Access Paper or Ask Questions