Alert button
Picture for Kaiwen Wang

Kaiwen Wang

Alert button

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

Jul 21, 2023
Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

Figure 1 for JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Figure 2 for JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Figure 3 for JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Figure 4 for JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

In this paper, we present \textsc{JoinGym}, an efficient and lightweight query optimization environment for reinforcement learning (RL). Join order selection (JOS) is a classic NP-hard combinatorial optimization problem from database query optimization and can serve as a practical testbed for the generalization capabilities of RL algorithms. We describe how to formulate each of the left-deep and bushy variants of the JOS problem as a Markov Decision Process (MDP), and we provide an implementation adhering to the standard Gymnasium API. We highlight that our implementation \textsc{JoinGym} is completely based on offline traces of all possible joins, which enables RL practitioners to easily and quickly test their methods on a realistic data management problem without needing to setup any systems. Moreover, we also provide all possible join traces on $3300$ novel SQL queries generated from the IMDB dataset. Upon benchmarking popular RL algorithms, we find that at least one method can obtain near-optimal performance on train-set queries but their performance degrades by several orders of magnitude on test-set queries. This gap motivates further research for RL algorithms that generalize well in multi-task combinatorial optimization problems.

* We will make all the queries available soon 
Viaarxiv icon

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

May 25, 2023
Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun

Figure 1 for The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Figure 2 for The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

While distributional reinforcement learning (RL) has demonstrated empirical success, the question of when and why it is beneficial has remained unanswered. In this work, we provide one explanation for the benefits of distributional RL through the lens of small-loss bounds, which scale with the instance-dependent optimal cost. If the optimal cost is small, our bounds are stronger than those from non-distributional approaches. As warmup, we show that learning the cost distribution leads to small-loss regret bounds in contextual bandits (CB), and we find that distributional CB empirically outperforms the state-of-the-art on three challenging tasks. For online RL, we propose a distributional version-space algorithm that constructs confidence sets using maximum likelihood estimation, and we prove that it achieves small-loss regret in the tabular MDPs and enjoys small-loss PAC bounds in latent variable models. Building on similar insights, we propose a distributional offline RL algorithm based on the pessimism principle and prove that it enjoys small-loss PAC bounds, which exhibit a novel robustness property. For both online and offline RL, our results provide the first theoretical benefits of learning distributions even when we only need the mean for making decisions.

Viaarxiv icon

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Feb 07, 2023
Kaiwen Wang, Nathan Kallus, Wen Sun

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $\Omega(\sqrt{\tau^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $\Omega(\sqrt{\tau^{-1}SAK})$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $\widetilde O(\sqrt{\tau^{-1}SAK})$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde O(\tau^{-1}\sqrt{SAK})$, which is minimax-optimal for constant $\tau$. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.

Viaarxiv icon

Learning Bellman Complete Representations for Offline Policy Evaluation

Jul 12, 2022
Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

Figure 1 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 2 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 3 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 4 for Learning Bellman Complete Representations for Offline Policy Evaluation

We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations satisfying these conditions are given, with results being mostly theoretical in nature. In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. With this learned representation, we perform OPE using Least Square Policy Evaluation (LSPE) with linear functions in our learned representation. We present an end-to-end theoretical analysis, showing that our two-stage algorithm enjoys polynomial sample complexity provided some representation in the rich class considered is linear Bellman complete. Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL (e.g., CURL, SPR). BCRL achieve competitive OPE error with the state-of-the-art method Fitted Q-Evaluation (FQE), and beats FQE when evaluating beyond the initial state distribution. Our ablations show that both linear Bellman complete and coverage components of our method are crucial.

* Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2938-2971, 2022  
* Accepted for Long Talk at ICML 2022 
Viaarxiv icon

Provable Benefits of Representational Transfer in Reinforcement Learning

May 29, 2022
Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

Figure 1 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 2 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 3 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 4 for Provable Benefits of Representational Transfer in Reinforcement Learning

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task. We propose a new notion of task relatedness between source and target tasks, and develop a novel approach for representational transfer under this assumption. Concretely, we show that given generative access to source tasks, we can discover a representation, using which subsequent linear RL techniques quickly converge to a near-optimal policy, with only online access to the target task. The sample complexity is close to knowing the ground truth features in the target task, and comparable to prior representation learning results in the source tasks. We complement our positive results with lower bounds without generative access, and validate our findings with empirical evaluation on rich observation MDPs that require deep exploration.

Viaarxiv icon

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Feb 19, 2022
Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou

Figure 1 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Figure 2 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Figure 3 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where experimentation is necessarily limited. OPE/L is nonetheless sensitive to discrepancies between the data-generating environment and that where policies are deployed. Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting, whose regret rates may deteriorate if propensities are estimated and whose variance is suboptimal even if not. For vanilla OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and prove its semiparametric efficiency under weak product rates conditions. Notably, thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for vanilla OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $\mathcal{O}(N^{-1/2})$ even when unknown propensities are nonparametrically estimated. We further extend our results to general $f$-divergence uncertainty sets. We illustrate the advantage of our algorithms in simulations.

Viaarxiv icon

Partitioned Active Learning for Heterogeneous Systems

May 14, 2021
Cheolhei Lee, Kaiwen Wang, Jianguo Wu, Wenjun Cai, Xiaowei Yue

Figure 1 for Partitioned Active Learning for Heterogeneous Systems
Figure 2 for Partitioned Active Learning for Heterogeneous Systems
Figure 3 for Partitioned Active Learning for Heterogeneous Systems
Figure 4 for Partitioned Active Learning for Heterogeneous Systems

Cost-effective and high-precision surrogate modeling is a cornerstone of automated industrial and engineering systems. Active learning coupled with Gaussian process (GP) surrogate modeling is an indispensable tool for demanding and complex systems, while the existence of heterogeneity in underlying systems may adversely affect the modeling process. In order to improve the learning efficiency under the regime, we propose the partitioned active learning strategy established upon partitioned GP (PGP) modeling. Our strategy seeks the most informative design point for PGP modeling systematically in twosteps. The global searching scheme accelerates the exploration aspect of active learning by investigating the most uncertain design space, and the local searching exploits the active learning criterion induced by the local GP model. We also provide numerical remedies to alleviate the computational cost of active learning, thereby allowing the proposed method to incorporate a large amount of candidates. The proposed method is applied to numerical simulation and real world cases endowed with heterogeneities in which surrogate models are constructed to embed in (i) the cost-efficient automatic fuselage shape control system; and (ii) the optimal design system of tribocorrosion-resistant alloys. The results show that our approach outperforms benchmark methods.

Viaarxiv icon

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

Dec 19, 2020
Kaiwen Wang, Travis Dick, Maria-Florina Balcan

Figure 1 for Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning
Figure 2 for Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning
Figure 3 for Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning
Figure 4 for Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algorithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace mechanism. Our second method LocalRNM significantly reduces communication and added noise by performing local optimization at each data holder. We provide the first utility guarantees for differentially private top-down decision tree learning in both the single machine and distributed settings. These guarantees show that the error of the privately-learned decision tree quickly goes to zero provided that the dataset is sufficiently large. Our extensive experiments on real datasets illustrate the trade-offs of privacy, accuracy and generalization when learning private decision trees in the distributed setting.

* In AAAI Workshop on Privacy-Preserving Artificial Intelligence, 2020 
Viaarxiv icon

NP-ODE: Neural Process Aided Ordinary Differential Equations for Uncertainty Quantification of Finite Element Analysis

Dec 12, 2020
Yinan Wang, Kaiwen Wang, Wenjun Cai, Xiaowei Yue

Figure 1 for NP-ODE: Neural Process Aided Ordinary Differential Equations for Uncertainty Quantification of Finite Element Analysis
Figure 2 for NP-ODE: Neural Process Aided Ordinary Differential Equations for Uncertainty Quantification of Finite Element Analysis
Figure 3 for NP-ODE: Neural Process Aided Ordinary Differential Equations for Uncertainty Quantification of Finite Element Analysis
Figure 4 for NP-ODE: Neural Process Aided Ordinary Differential Equations for Uncertainty Quantification of Finite Element Analysis

Finite element analysis (FEA) has been widely used to generate simulations of complex and nonlinear systems. Despite its strength and accuracy, the limitations of FEA can be summarized into two aspects: a) running high-fidelity FEA often requires significant computational cost and consumes a large amount of time; b) FEA is a deterministic method that is insufficient for uncertainty quantification (UQ) when modeling complex systems with various types of uncertainties. In this paper, a physics-informed data-driven surrogate model, named Neural Process Aided Ordinary Differential Equation (NP-ODE), is proposed to model the FEA simulations and capture both input and output uncertainties. To validate the advantages of the proposed NP-ODE, we conduct experiments on both the simulation data generated from a given ordinary differential equation and the data collected from a real FEA platform for tribocorrosion. The performances of the proposed NP-ODE and several benchmark methods are compared. The results show that the proposed NP-ODE outperforms benchmark methods. The NP-ODE method realizes the smallest predictive error as well as generates the most reasonable confidence interval having the best coverage on testing data points.

* 40 pages 
Viaarxiv icon

Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations

Jun 07, 2018
Guannan Zhao, Bo Zhou, Kaiwen Wang, Rui Jiang, Min Xu

Figure 1 for Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations
Figure 2 for Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations
Figure 3 for Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations
Figure 4 for Respond-CAM: Analyzing Deep Models for 3D Imaging Data by Visualizations

The convolutional neural network (CNN) has become a powerful tool for various biomedical image analysis tasks, but there is a lack of visual explanation for the machinery of CNNs. In this paper, we present a novel algorithm, Respond-weighted Class Activation Mapping (Respond-CAM), for making CNN-based models interpretable by visualizing input regions that are important for predictions, especially for biomedical 3D imaging data inputs. Our method uses the gradients of any target concept (e.g. the score of target class) that flows into a convolutional layer. The weighted feature maps are combined to produce a heatmap that highlights the important regions in the image for predicting the target concept. We prove a preferable sum-to-score property of the Respond-CAM and verify its significant improvement on 3D images from the current state-of-the-art approach. Our tests on Cellular Electron Cryo-Tomography 3D images show that Respond-CAM achieves superior performance on visualizing the CNNs with 3D biomedical images inputs, and is able to get reasonably good results on visualizing the CNNs with natural image inputs. The Respond-CAM is an efficient and reliable approach for visualizing the CNN machinery, and is applicable to a wide variety of CNN model families and image analysis tasks.

* Medical Image Computing & Computer Assisted Intervention (MICCAI) 2018  
Viaarxiv icon