Abstract:Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.




Abstract:Image-to-Image translation (I2I) is a subtype of Machine Learning (ML) that has tremendous potential in applications where two domains of images and the need for translation between the two exist, such as the removal of fog. For example, this could be useful for autonomous vehicles, which currently struggle with adverse weather conditions like fog. However, datasets for I2I tasks are not abundant and typically hard to acquire. Here, we introduce STEREOFOG, a dataset comprised of $10,067$ paired fogged and clear images, captured using a custom-built device, with the purpose of exploring I2I's potential in this domain. It is the only real-world dataset of this kind to the best of our knowledge. Furthermore, we apply and optimize the pix2pix I2I ML framework to this dataset. With the final model achieving an average Complex Wavelet-Structural Similarity (CW-SSIM) score of $0.76$, we prove the technique's suitability for the problem.