Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patric Rommel

Infra-Bayesian Reinforcement Learning Agents Outperform Classical RL For Worst-Case Robustness

May 22, 2026

Manish Aryal, Faiyaz Azam, Agnivo Banerjee, Sai Sidhanth Manoharan Jayanthi, Allegra Laro, Clément Legentilhomme, Andrew Lin, Florian Lorkowski, Radman Rakhshandehroo, Patric Rommel(+3 more)

Abstract:Classical reinforcement learning assumes the agent interacts with a fixed environment whose behavior does not depend on the agent's policy. This assumption breaks down in non-realizable settings where other actors might anticipate the agent's behavior, including environments crucial to AI safety, where the agent interacts with predictors, humans, other AI agents, and institutions. In such settings, the agent's model class fails to capture the world in which it operates. Under such misspecification, classical Bayesian methods can produce confidently wrong posteriors, unreliable decisions, and unbounded regret, as realizability fails to obtain. Infra-Bayesianism is a decision-theoretic framework that addresses these failures by distinguishing ordinary probabilistic uncertainty, where priors can be reasonably chosen, from Knightian uncertainty, where no grounds exist for the construction of such a prior. It does so by evaluating actions on their worst-case outcomes, rather than from posterior expectations or weighted averaging. We present the first proof-of-concept implementation of an infra-Bayesian reinforcement learning architecture for finite-outcome stateless decision problems. Our agent maintains a set of imprecise hypotheses, updates them using infra-Bayesian conditioning, and selects actions by maximizing worst-case expected value. We apply this implementation of the infra-Bayesian maximin decision process to an environment with Knightian uncertainty, and demonstrate a lower worst-case regret as compared to classical reinforcement learning agents. We also investigate Newcomb's problem and show that the infra-Bayesian agent picks the optimal strategy, outperforming classical decision theory agents. Our results provide a step towards reinforcement learning agents that remain robust under model misspecification and policy-dependent uncertainty.

Via

Access Paper or Ask Questions

Gaussian-process-regression-based method for the localization of exceptional points in complex resonance spectra

Feb 07, 2024

Patrick Egenlauf, Patric Rommel, Jörg Main

Abstract:Resonances in open quantum systems depending on at least two controllable parameters can show the phenomenon of exceptional points (EPs), where not only the eigenvalues but also the eigenvectors of two or more resonances coalesce. Their exact localization in the parameter space is challenging, in particular in systems, where the computation of the quantum spectra and resonances is numerically very expensive. We introduce an efficient machine learning algorithm to find exceptional points based on Gaussian process regression (GPR). The GPR-model is trained with an initial set of eigenvalue pairs belonging to an EP and used for a first estimation of the EP position via a numerically cheap root search. The estimate is then improved iteratively by adding selected exact eigenvalue pairs as training points to the GPR-model. The GPR-based method is developed and tested on a simple low-dimensional matrix model and then applied to a challenging real physical system, viz., the localization of EPs in the resonance spectra of excitons in cuprous oxide in external electric and magnetic fields. The precise computation of EPs, by taking into account the complete valence band structure and central-cell corrections of the crystal, can be the basis for the experimental observation of EPs in this system.

* 28 pages, 10 figures, submitted to Machine Learning: Science and Technology

Via

Access Paper or Ask Questions