Abstract:Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.
Abstract:Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of probabilistic programs presents even further opportunities for enabling a high-level style of programming, by automating time-consuming and error-prone tasks. We apply static analysis to probabilistic programs to automate large parts of two crucial model checking methods: Prior Predictive Checks and Simulation-Based Calibration. Our method transforms a probabilistic program specifying a density function into an efficient forward-sampling form. To achieve this transformation, we extract a factor graph from a probabilistic program using static analysis, generate a set of proposal directed acyclic graphs using a SAT solver, select a graph which will produce provably correct sampling code, then generate one or more sampling programs. We allow minimal user interaction to broaden the scope of application beyond what is possible with static analysis alone. We present an implementation targeting the popular Stan probabilistic programming language, automating large parts of a robust Bayesian workflow for a wide community of probabilistic programming users.