Abstract:Robust reinforcement learning (RL) aims to learn policies that remain effective despite uncertainties in its environment, which frequently arise in real-world applications due to variations in environment dynamics. The robust RL methods learn a robust policy by maximizing value under the worst-case models within a predefined uncertainty set. Offline robust RL algorithms are particularly promising in scenarios where only a fixed dataset is available and new data cannot be collected. However, these approaches often require extensive offline data, and gathering such datasets for specific tasks in specific environments can be both costly and time-consuming. Using an imperfect simulator offers a faster, cheaper, and safer way to collect data for training, but it can suffer from dynamics mismatch. In this paper, we introduce HYDRO, the first Hybrid Cross-Domain Robust RL framework designed to address these challenges. HYDRO utilizes an online simulator to complement the limited amount of offline datasets in the non-trivial context of robust RL. By measuring and minimizing performance gaps between the simulator and the worst-case models in the uncertainty set, HYDRO employs novel uncertainty filtering and prioritized sampling to select the most relevant and reliable simulator samples. Our extensive experiments demonstrate HYDRO's superior performance over existing methods across various tasks, underscoring its potential to improve sample efficiency in offline robust RL.
Abstract:Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trial-and-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator to a real-world target environment. However, this process poses challenges. Simulators, no matter how advanced, cannot perfectly replicate the intricacies of the real world, leading to dynamics discrepancies between the source and target environments. Past research posited that the source domain must encompass all possible target transitions, a condition we term full support. However, expecting full support is often unrealistic, especially in scenarios where significant dynamics discrepancies arise. In this paper, our emphasis shifts to addressing large dynamics mismatch adaptation. We move away from the stringent full support condition of earlier research, focusing instead on crafting an effective policy for the target domain. Our proposed approach is simple but effective. It is anchored in the central concepts of the skewing and extension of source support towards target support to mitigate support deficiencies. Through comprehensive testing on a varied set of benchmarks, our method's efficacy stands out, showcasing notable improvements over previous techniques.
Abstract:Black-box optimization is a powerful approach for discovering global optima in noisy and expensive black-box functions, a problem widely encountered in real-world scenarios. Recently, there has been a growing interest in leveraging domain knowledge to enhance the efficacy of machine learning methods. Partial Differential Equations (PDEs) often provide an effective means for elucidating the fundamental principles governing the black-box functions. In this paper, we propose PINN-BO, a black-box optimization algorithm employing Physics-Informed Neural Networks that integrates the knowledge from Partial Differential Equations (PDEs) to improve the sample efficiency of the optimization. We analyze the theoretical behavior of our algorithm in terms of regret bound using advances in NTK theory and prove that the use of the PDE alongside the black-box function evaluations, PINN-BO leads to a tighter regret bound. We perform several experiments on a variety of optimization tasks and show that our algorithm is more sample-efficient compared to existing methods.