Alert button
Picture for Udaya Ghai

Udaya Ghai

Alert button

Online Nonstochastic Model-Free Reinforcement Learning

May 27, 2023
Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad Hazan

Figure 1 for Online Nonstochastic Model-Free Reinforcement Learning
Figure 2 for Online Nonstochastic Model-Free Reinforcement Learning
Figure 3 for Online Nonstochastic Model-Free Reinforcement Learning
Figure 4 for Online Nonstochastic Model-Free Reinforcement Learning

In this work, we explore robust model-free reinforcement learning algorithms for environments that may be dynamic or even adversarial. Conventional state-based policies fail to accommodate the challenge imposed by the presence of unmodeled disturbances in such settings. Additionally, optimizing linear state-based policies pose obstacle for efficient optimization, leading to nonconvex objectives even in benign environments like linear dynamical systems. Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies centered on disturbance signals. We define several categories of these signals, referred to as pseudo-disturbances, and corresponding policy classes based on them. We provide efficient and practical algorithms for optimizing these policies. Next, we examine the task of online adaptation of reinforcement learning agents to adversarial disturbances. Our methods can be integrated with any black-box model-free approach, resulting in provable regret guarantees if the underlying dynamics is linear. We evaluate our method over different standard RL benchmarks and demonstrate improved robustness.

Viaarxiv icon

Non-convex online learning via algorithmic equivalence

May 30, 2022
Udaya Ghai, Zhou Lu, Elad Hazan

Figure 1 for Non-convex online learning via algorithmic equivalence

We study an algorithmic equivalence technique between nonconvex gradient descent and convex mirror descent. We start by looking at a harder problem of regret minimization in online non-convex optimization. We show that under certain geometric and smoothness conditions, online gradient descent applied to non-convex functions is an approximation of online mirror descent applied to convex functions under reparameterization. In continuous time, the gradient flow with this reparameterization was shown to be exactly equivalent to continuous-time mirror descent by Amid and Warmuth 2020, but theory for the analogous discrete time algorithms is left as an open problem. We prove an $O(T^{\frac{2}{3}})$ regret bound for non-convex online gradient descent in this setting, answering this open problem. Our analysis is based on a new and simple algorithmic equivalence method.

Viaarxiv icon

A Regret Minimization Approach to Multi-Agent Control

Feb 01, 2022
Udaya Ghai, Udari Madhushani, Naomi Leonard, Elad Hazan

Figure 1 for A Regret Minimization Approach to Multi-Agent Control
Figure 2 for A Regret Minimization Approach to Multi-Agent Control

We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics.

Viaarxiv icon

Machine Learning for Mechanical Ventilation Control (Extended Abstract)

Nov 23, 2021
Daniel Suo, Cyril Zhang, Paula Gradu, Udaya Ghai, Xinyi Chen, Edgar Minasyan, Naman Agarwal, Karan Singh, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, Elad Hazan

Mechanical ventilation is one of the most widely used therapies in the ICU. However, despite broad application from anaesthesia to COVID-related life support, many injurious challenges remain. We frame these as a control problem: ventilators must let air in and out of the patient's lungs according to a prescribed trajectory of airway pressure. Industry-standard controllers, based on the PID method, are neither optimal nor robust. Our data-driven approach learns to control an invasive ventilator by training on a simulator itself trained on data collected from the ventilator. This method outperforms popular reinforcement learning algorithms and even controls the physical ventilator more accurately and robustly than PID. These results underscore how effective data-driven methodologies can be for invasive ventilation and suggest that more general forms of ventilation (e.g., non-invasive, adaptive) may also be amenable.

* Machine Learning for Health (ML4H) at NeurIPS 2021 - Extended Abstract. arXiv admin note: substantial text overlap with arXiv:2102.06779 
Viaarxiv icon

Robust Online Control with Model Misspecification

Jul 16, 2021
Xinyi Chen, Udaya Ghai, Elad Hazan, Alexandre Megretski

Figure 1 for Robust Online Control with Model Misspecification

We study online control of an unknown nonlinear dynamical system that is approximated by a time-invariant linear system with model misspecification. Our study focuses on robustness, which measures how much deviation from the assumed linear approximation can be tolerated while maintaining a bounded $\ell_2$-gain compared to the optimal control in hindsight. Some models cannot be stabilized even with perfect knowledge of their coefficients: the robustness is limited by the minimal distance between the assumed dynamics and the set of unstabilizable dynamics. Therefore it is necessary to assume a lower bound on this distance. Under this assumption, and with full observation of the $d$ dimensional state, we describe an efficient controller that attains $\Omega(\frac{1}{\sqrt{d}})$ robustness together with an $\ell_2$-gain whose dimension dependence is near optimal. We also give an inefficient algorithm that attains constant robustness independent of the dimension, with a finite but sub-optimal $\ell_2$-gain.

Viaarxiv icon

Machine Learning for Mechanical Ventilation Control

Feb 26, 2021
Daniel Suo, Cyril Zhang, Paula Gradu, Udaya Ghai, Xinyi Chen, Edgar Minasyan, Naman Agarwal, Karan Singh, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, Elad Hazan

Figure 1 for Machine Learning for Mechanical Ventilation Control
Figure 2 for Machine Learning for Mechanical Ventilation Control
Figure 3 for Machine Learning for Mechanical Ventilation Control
Figure 4 for Machine Learning for Mechanical Ventilation Control

We consider the problem of controlling an invasive mechanical ventilator for pressure-controlled ventilation: a controller must let air in and out of a sedated patient's lungs according to a trajectory of airway pressures specified by a clinician. Hand-tuned PID controllers and similar variants have comprised the industry standard for decades, yet can behave poorly by over- or under-shooting their target or oscillating rapidly. We consider a data-driven machine learning approach: First, we train a simulator based on data we collect from an artificial lung. Then, we train deep neural network controllers on these simulators.We show that our controllers are able to track target pressure waveforms significantly better than PID controllers. We further show that a learned controller generalizes across lungs with varying characteristics much more readily than PID controllers do.

Viaarxiv icon

Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking

Feb 19, 2021
Paula Gradu, John Hallman, Daniel Suo, Alex Yu, Naman Agarwal, Udaya Ghai, Karan Singh, Cyril Zhang, Anirudha Majumdar, Elad Hazan

Figure 1 for Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
Figure 2 for Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
Figure 3 for Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking
Figure 4 for Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking

We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmark-ing suite. The introduced environments allow auto-differentiation through the simulation dynamics, and thereby permit fast training of controllers. The library features several popular environments, including classical control settings from OpenAI Gym. We also provide a novel differentiable environment, based on deep neural networks, that simulates medical ventilation. We give several use-cases of new scientific results obtained using the library. This includes a medical ventilator simulator and controller, an adaptive control method for time-varying linear dynamical systems, and new gradient-based methods for control of linear dynamical systems with adversarial perturbations.

Viaarxiv icon