Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Jianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, Shaolei Ren

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.

Via

Jinhang Zuo, Zhiyao Zhang, Xuchuang Wang, Cheng Chen, Shuai Li, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman

Cooperative multi-agent multi-armed bandits (CMA2B) consider the collaborative efforts of multiple agents in a shared multi-armed bandit game. We study latent vulnerabilities exposed by this collaboration and consider adversarial attacks on a few agents with the goal of influencing the decisions of the rest. More specifically, we study adversarial attacks on CMA2B in both homogeneous settings, where agents operate with the same arm set, and heterogeneous settings, where agents have distinct arm sets. In the homogeneous setting, we propose attack strategies that, by targeting just one agent, convince all agents to select a particular target arm $T-o(T)$ times while incurring $o(T)$ attack costs in $T$ rounds. In the heterogeneous setting, we prove that a target arm attack requires linear attack costs and propose attack strategies that can force a maximum number of agents to suffer linear regrets while incurring sublinear costs and only manipulating the observations of a few target agents. Numerical experiments validate the effectiveness of our proposed attack strategies.

Via

Neelkamal Bhuyan, Debankur Mukherjee, Adam Wierman

Convex function chasing (CFC) is an online optimization problem in which during each round $t$, a player plays an action $x_t$ in response to a hitting cost $f_t(x_t)$ and an additional cost of $c(x_t,x_{t-1})$ for switching actions. We study the CFC problem in stochastic and adversarial environments, giving algorithms that achieve performance guarantees simultaneously in both settings. Specifically, we consider the squared $\ell_2$-norm switching costs and a broad class of quadratic hitting costs for which the sequence of minimizers either forms a martingale or is chosen adversarially. This is the first work that studies the CFC problem using a stochastic framework. We provide a characterization of the optimal stochastic online algorithm and, drawing a comparison between the stochastic and adversarial scenarios, we demonstrate that the adversarial-optimal algorithm exhibits suboptimal performance in the stochastic context. Motivated by this, we provide a best-of-both-worlds algorithm that obtains robust adversarial performance while simultaneously achieving near-optimal stochastic performance.

Via

Adam Lechowicz, Nicolas Christianson, Bo Sun, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy

We introduce and study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. In this problem, an online player attempts to purchase (alternatively, sell) fractional shares of an asset during a fixed time horizon with length $T$. At each time step, a cost function (alternatively, price function) is revealed, and the player must irrevocably decide an amount of asset to convert. The player also incurs a switching cost whenever their decision changes in consecutive time steps, i.e., when they increase or decrease their purchasing amount. We introduce competitive (robust) threshold-based algorithms for both the minimization and maximization variants of this problem, and show they are optimal among deterministic online algorithms. We then propose learning-augmented algorithms that take advantage of untrusted black-box advice (such as predictions from a machine learning model) to achieve significantly better average-case performance without sacrificing worst-case competitive guarantees. Finally, we empirically evaluate our proposed algorithms using a carbon-aware EV charging case study, showing that our algorithms substantially improve on baseline methods for this problem.

Via

Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

We study a challenging form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step nonlinear switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically,we prove that RCL is able to guarantee$(1+\lambda)$-competitiveness against any given expert for any$\lambda>0$, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly,RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay.We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportationas a case study.

Via

Bo Sun, Jerry Huang, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman

Online algorithms with predictions have become a trending topic in the field of beyond worst-case analysis of algorithms. These algorithms incorporate predictions about the future to obtain performance guarantees that are of high quality when the predictions are good, while still maintaining bounded worst-case guarantees when predictions are arbitrarily poor. In general, the algorithm is assumed to be unaware of the prediction's quality. However, recent developments in the machine learning literature have studied techniques for providing uncertainty quantification on machine-learned predictions, which describes how certain a model is about its quality. This paper examines the question of how to optimally utilize uncertainty-quantified predictions in the design of online algorithms. In particular, we consider predictions augmented with uncertainty quantification describing the likelihood of the ground truth falling in a certain range, designing online algorithms with these probabilistic predictions for two classic online problems: ski rental and online search. In each case, we demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the probabilistic predictions. Moreover, we consider how to utilize more general forms of uncertainty quantification, proposing a framework based on online learning that learns to exploit uncertainty quantification to make optimal decisions in multi-instance settings.

Via

Yingying Li, Jing Yu, Lauren Conger, Adam Wierman

Set-membership estimation is commonly used in adaptive/learning-based control algorithms that require robustness over the model uncertainty sets, e.g., online robustly stabilizing control and robust adaptive model predictive control. Despite having broad applications, non-asymptotic estimation error bounds in the stochastic setting are limited. This paper provides such a non-asymptotic bound on the diameter of the uncertainty sets generated by set membership estimation on linear dynamical systems under bounded, i.i.d. disturbances. Further, this result is applied to robust adaptive model predictive control with uncertainty sets updated by set membership. We numerically demonstrate the performance of the robust adaptive controller, which rapidly approaches the performance of the offline optimal model predictive controller, in comparison with the control design based on least square estimation's confidence regions.

Via

Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman

We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.

Via

Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

Fueled by the soaring popularity of large language and foundation models, the accelerated growth of artificial intelligence (AI) models' enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity -- the fact that AI's environmental footprint can be disproportionately higher in certain regions than in others -- has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI's environmental inequity by balancing its regional negative environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (GLB) to explicitly address AI's environmental impacts on the most disadvantaged regions. We run trace-based simulations by considering a set of 10 geographically-distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while our proposed equity-aware GLB can significantly reduce the regional disparity in terms of carbon and water footprints.

Via