Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yisong Yue

California Institute of Technology

Learning for Safety-Critical Control with Control Barrier Functions

Dec 20, 2019
Andrew Taylor, Andrew Singletary, Yisong Yue, Aaron Ames

Figure 1 for Learning for Safety-Critical Control with Control Barrier Functions

Figure 2 for Learning for Safety-Critical Control with Control Barrier Functions

Modern nonlinear control theory seeks to endow systems with properties of stability and safety, and have been deployed successfully in multiple domains. Despite this success, model uncertainty remains a significant challenge in synthesizing safe controllers, leading to degradation in the properties provided by the controllers. This paper develops a machine learning framework utilizing Control Barrier Functions (CBFs) to reduce model uncertainty as it impact the safe behavior of a system. This approach iteratively collects data and updates a controller, ultimately achieving safe behavior. We validate this method in simulation and experimentally on a Segway platform.

* Extended version (12 Pages), Short version submitted to Learning for Dynamics & Control (L4DC) 2020 Conference

Via

Access Paper or Ask Questions

Triply Robust Off-Policy Evaluation

Nov 16, 2019
Anqi Liu, Hao Liu, Anima Anandkumar, Yisong Yue

Figure 1 for Triply Robust Off-Policy Evaluation

Figure 2 for Triply Robust Off-Policy Evaluation

Figure 3 for Triply Robust Off-Policy Evaluation

Figure 4 for Triply Robust Off-Policy Evaluation

We propose a robust regression approach to off-policy evaluation (OPE) for contextual bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression tools. Ours is a general approach that can be used to augment any existing OPE method that utilizes the direct method. When augmenting doubly robust methods, we call the resulting method Triply Robust. We prove upper bounds on the resulting bias and variance, as well as derive novel minimax bounds based on robust minimax analysis for covariate shift. Our robust regression method is compatible with deep learning, and is thus applicable to complex OPE settings that require powerful function approximators. Finally, we demonstrate superior empirical performance across the standard OPE benchmarks, especially in the case where the logging policy is unknown and must be estimated from data.

* Preliminary Work

Via

Access Paper or Ask Questions

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Nov 15, 2019
Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

Figure 1 for Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Figure 2 for Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Figure 3 for Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Figure 4 for Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy using only pre-collected historical data generated by another policy. Given the increasing interest in deploying learning-based methods for safety-critical applications, many recent OPE methods have recently been proposed. Due to disparate experimental conditions from recent literature, the relative performance of current OPE methods is not well understood. In this work, we present the first comprehensive empirical analysis of a broad suite of OPE methods. Based on thousands of experiments and detailed empirical analyses, we offer a summarized set of guidelines for effectively using OPE in practice, and suggest directions for future research.

* Main paper is 8 pages. The appendix contains many pages of tables

Via

Access Paper or Ask Questions

Landmark Ordinal Embedding

Oct 27, 2019
Nikhil Ghosh, Yuxin Chen, Yisong Yue

In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form "item j is closer to item i than item k". Existing approaches for this "ordinal embedding" problem require expensive optimization procedures, which cannot scale to handle increasingly larger datasets. To address this issue, we propose a landmark-based strategy, which we call Landmark Ordinal Embedding (LOE). Our approach trades off statistical efficiency for computational efficiency by exploiting the low-dimensionality of the latent embedding. We derive bounds establishing the statistical consistency of LOE under the popular Bradley-Terry-Luce noise model. Through a rigorous analysis of the computational complexity, we show that LOE is significantly more efficient than conventional ordinal embedding approaches as the number of items grows. We validate these characterizations empirically on both synthetic and real datasets. We also present a practical approach that achieves the "best of both worlds", by using LOE to warm-start existing methods that are more statistically efficient but computationally expensive.

* NeurIPS 2019

Via

Access Paper or Ask Questions

Learning Calibratable Policies using Programmatic Style-Consistency

Oct 02, 2019
Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht

Figure 1 for Learning Calibratable Policies using Programmatic Style-Consistency

Figure 2 for Learning Calibratable Policies using Programmatic Style-Consistency

Figure 3 for Learning Calibratable Policies using Programmatic Style-Consistency

Figure 4 for Learning Calibratable Policies using Programmatic Style-Consistency

We study the important and challenging problem of controllable generation of long-term sequential behaviors. Solutions to this problem would impact many applications, such as calibrating behaviors of AI agents in games or predicting player trajectories in sports. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are significant challenges that are unique to or exacerbated by generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated temporal behavior faithfully demonstrates diverse styles? In this paper, we leverage large amounts of raw behavioral data to learn policies that can be calibrated to generate a diverse range of behavior styles (e.g., aggressive versus passive play in sports). Inspired by recent work on leveraging programmatic labeling functions, we present a novel framework that combines imitation learning with data programming to learn style-calibratable policies. Our primary technical contribution is a formal notion of style-consistency as a learning objective, and its integration with conventional imitation learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that our learned policies can be accurately calibrated to generate interesting behavior styles in both domains.

Via

Access Paper or Ask Questions

Preference-Based Learning for Exoskeleton Gait Optimization

Sep 26, 2019
Maegan Tucker, Ellen Novoseller, Claudia Kann, Yanan Sui, Yisong Yue, Joel Burdick, Aaron D. Ames

Figure 1 for Preference-Based Learning for Exoskeleton Gait Optimization

Figure 2 for Preference-Based Learning for Exoskeleton Gait Optimization

Figure 3 for Preference-Based Learning for Exoskeleton Gait Optimization

Figure 4 for Preference-Based Learning for Exoskeleton Gait Optimization

This paper presents a personalized gait optimization framework for lower-body exoskeletons. Rather than optimizing numerical objectives such as the mechanical cost of transport, our approach directly learns from user preferences, e.g., for comfort. Building upon work in preference-based interactive learning, we present the CoSpar algorithm. CoSpar prompts the user to give pairwise preferences between trials and suggest improvements; as exoskeleton walking is a non-intuitive behavior, users can provide preferences more easily and reliably than numerical feedback. We show that CoSpar performs competitively in simulation and demonstrate a prototype implementation of CoSpar on a lower-body exoskeleton to optimize human walking trajectory features. In the experiments, CoSpar consistently found user-preferred parameters of the exoskeleton's walking gait, which suggests that it is a promising starting point for adapting and personalizing exoskeletons (or other assistive devices) to individual users.

* 7 pages, 7 figures

Via

Access Paper or Ask Questions

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Aug 04, 2019
Ellen R. Novoseller, Yanan Sui, Yisong Yue, Joel W. Burdick

Figure 1 for Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Figure 2 for Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Figure 3 for Dueling Posterior Sampling for Preference-Based Reinforcement Learning

In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present Dueling Posterior Sampling (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the user's preferences. Because preference feedback is provided on trajectories rather than individual state/action pairs, we develop a Bayesian approach to solving the credit assignment problem, translating user preferences to a posterior distribution over state/action reward models. We prove an asymptotic no-regret rate for DPS with a Bayesian logistic regression credit assignment model; to our knowledge, this is the first regret guarantee for preference-based RL. We also discuss possible avenues for extending this proof methodology to analyze other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines.

* 8 pages before references and Appendix; 35 pages total; 3 figures; 1 table

Via

Access Paper or Ask Questions

An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

Jul 26, 2019
Baihong Jin, Yingshui Tan, Alexander Nettekoven, Yuxin Chen, Ufuk Topcu, Yisong Yue, Alberto Sangiovanni Vincentelli

Figure 1 for An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

Figure 2 for An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

Figure 3 for An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

Figure 4 for An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

We present a novel unsupervised deep learning approach that utilizes the encoder-decoder architecture for detecting anomalies in sequential sensor data collected during industrial manufacturing. Our approach is designed not only to detect whether there exists an anomaly at a given time step, but also to predict what will happen next in the (sequential) process. We demonstrate our approach on a dataset collected from a real-world testbed. The dataset contains images collected under both normal conditions and synthetic anomalies. We show that the encoder-decoder model is able to identify the injected anomalies in a modern manufacturing process in an unsupervised fashion. In addition, it also gives hints about the temperature non-uniformity of the testbed during manufacturing, which is what we are not aware of before doing the experiment.

Via

Access Paper or Ask Questions

Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Jul 11, 2019
Abhinav Verma, Hoang M. Le, Yisong Yue, Swarat Chaudhuri

Figure 1 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 2 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 3 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 4 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

We present Imitation-Projected Policy Gradient (IPPG), an algorithmic framework for learning policies that are parsimoniously represented in a structured programming language. Such programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for programmatic policies remains a challenge. IPPG, our response to this challenge, is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a "lift-and-project" perspective that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for IPPG, as well as an empirical evaluation in three continuous control domains. The experiments show that IPPG can significantly outperform state-of-the-art approaches for learning programmatic policies.

Via

Access Paper or Ask Questions

Co-training for Policy Learning

Jul 03, 2019
Jialin Song, Ravi Lanka, Yisong Yue, Masahiro Ono

Figure 1 for Co-training for Policy Learning

Figure 2 for Co-training for Policy Learning

Figure 3 for Co-training for Policy Learning

Figure 4 for Co-training for Policy Learning

We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

* UAI 2019, oral presentation

Via

Access Paper or Ask Questions