Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Krause

Active Exploration for Inverse Reinforcement Learning

Jul 18, 2022
David Lindner, Andreas Krause, Giorgia Ramponi

Figure 1 for Active Exploration for Inverse Reinforcement Learning

Figure 2 for Active Exploration for Inverse Reinforcement Learning

Figure 3 for Active Exploration for Inverse Reinforcement Learning

Figure 4 for Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.

Via

Access Paper or Ask Questions

Graph Neural Network Bandits

Jul 13, 2022
Parnian Kassraie, Andreas Krause, Ilija Bogunovic

Figure 1 for Graph Neural Network Bandits

Figure 2 for Graph Neural Network Bandits

Figure 3 for Graph Neural Network Bandits

Figure 4 for Graph Neural Network Bandits

We consider the bandit optimization problem with the reward function defined over graph-structured data. This problem has important applications in molecule design and drug discovery, where the reward is naturally invariant to graph permutations. The key challenges in this setting are scaling to large domains, and to graphs with many nodes. We resolve these challenges by embedding the permutation invariance into our model. In particular, we show that graph neural networks (GNNs) can be used to estimate the reward function, assuming it resides in the Reproducing Kernel Hilbert Space of a permutation-invariant additive kernel. By establishing a novel connection between such kernels and the graph neural tangent kernel (GNTK), we introduce the first GNN confidence bound and use it to design a phased-elimination algorithm with sublinear regret. Our regret bound depends on the GNTK's maximum information gain, which we also provide a bound for. While the reward function depends on all $N$ node features, our guarantees are independent of the number of graph nodes $N$. Empirically, our approach exhibits competitive performance and scales well on graph-structured domains.

* 27 pages, 8 figures

Via

Access Paper or Ask Questions

Safe Reinforcement Learning via Confidence-Based Filters

Jul 04, 2022
Sebastian Curi, Armin Lederer, Sandra Hirche, Andreas Krause

Figure 1 for Safe Reinforcement Learning via Confidence-Based Filters

Figure 2 for Safe Reinforcement Learning via Confidence-Based Filters

Figure 3 for Safe Reinforcement Learning via Confidence-Based Filters

Figure 4 for Safe Reinforcement Learning via Confidence-Based Filters

Ensuring safety is a crucial challenge when deploying reinforcement learning (RL) to real-world systems. We develop confidence-based safety filters, a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard RL techniques, based on probabilistic dynamics models. Our approach is based on a reformulation of state constraints in terms of cost functions, reducing safety verification to a standard RL task. By exploiting the concept of hallucinating inputs, we extend this formulation to determine a "backup" policy that is safe for the unknown system with high probability. Finally, the nominal policy is minimally adjusted at every time step during a roll-out towards the backup policy, such that safe recovery can be guaranteed afterwards. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

Active Exploration via Experiment Design in Markov Chains

Jun 29, 2022
Mojmír Mutný, Tadeusz Janik, Andreas Krause

Figure 1 for Active Exploration via Experiment Design in Markov Chains

Figure 2 for Active Exploration via Experiment Design in Markov Chains

Figure 3 for Active Exploration via Experiment Design in Markov Chains

Figure 4 for Active Exploration via Experiment Design in Markov Chains

A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.

Via

Access Paper or Ask Questions

Supervised Training of Conditional Monge Maps

Jun 28, 2022
Charlotte Bunne, Andreas Krause, Marco Cuturi

Figure 1 for Supervised Training of Conditional Monge Maps

Figure 2 for Supervised Training of Conditional Monge Maps

Figure 3 for Supervised Training of Conditional Monge Maps

Figure 4 for Supervised Training of Conditional Monge Maps

Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu,\nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, the data measures $\mu,\nu$ (features of untreated/treated cells) that define optimal transport problems do not arise in isolation but are associated with a context $c$ (the treatment). To account for and incorporate that context in OT estimation, we introduce CondOT, an approach to estimate OT maps conditioned on a context variable, using several pairs of measures $(\mu_i, \nu_i)$ tagged with a context label $c_i$. Our goal is to % extract from a dataset of labeled pairs $\{(c_i, (\mu_i, \nu_i))\}$ learn a global map $\mathcal{T}_{\theta}$ which is not only expected to fit em all pairs in the dataset $\{(c_i, (\mu_i, \nu_i))\}$, i.e., $\mathcal{T}_{\theta}(c_i) \sharp\mu_i \approx \nu_i$, but should generalize to produce meaningful maps $\mathcal{T}_{\theta}(c_{\text{new}})$ conditioned on unseen contexts $c_{\text{new}}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.

Via

Access Paper or Ask Questions

Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Jun 27, 2022
Max B. Paulus, Giulia Zarpellon, Andreas Krause, Laurent Charlin, Chris J. Maddison

Figure 1 for Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Figure 2 for Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Figure 3 for Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Figure 4 for Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.

* ICML 2022

Via

Access Paper or Ask Questions

Invariant Causal Mechanisms through Distribution Matching

Jun 23, 2022
Mathieu Chevalley, Charlotte Bunne, Andreas Krause, Stefan Bauer

Figure 1 for Invariant Causal Mechanisms through Distribution Matching

Figure 2 for Invariant Causal Mechanisms through Distribution Matching

Figure 3 for Invariant Causal Mechanisms through Distribution Matching

Figure 4 for Invariant Causal Mechanisms through Distribution Matching

Learning representations that capture the underlying data generating process is a key problem for data efficient and robust use of neural networks. One key property for robustness which the learned representation should capture and which recently received a lot of attention is described by the notion of invariance. In this work we provide a causal perspective and new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization, where we are able to significantly boost the score of existing models.

Via

Access Paper or Ask Questions

The Dynamics of Riemannian Robbins-Monro Algorithms

Jun 16, 2022
Mohammad Reza Karimi, Ya-Ping Hsieh, Panayotis Mertikopoulos, Andreas Krause

Figure 1 for The Dynamics of Riemannian Robbins-Monro Algorithms

Many important learning algorithms, such as stochastic gradient methods, are often deployed to solve nonlinear problems on Riemannian manifolds. Motivated by these applications, we propose a family of Riemannian algorithms generalizing and extending the seminal stochastic approximation framework of Robbins and Monro. Compared to their Euclidean counterparts, Riemannian iterative algorithms are much less understood due to the lack of a global linear structure on the manifold. We overcome this difficulty by introducing an extended Fermi coordinate frame which allows us to map the asymptotic behavior of the proposed Riemannian Robbins-Monro (RRM) class of algorithms to that of an associated deterministic dynamical system under very mild assumptions on the underlying manifold. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, albeit with a significantly more involved analysis that requires a number of new geometric ingredients. We showcase the flexibility of the proposed RRM framework by using it to establish the convergence of a retraction-based analogue of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence.

Via

Access Paper or Ask Questions

Interactively Learning Preference Constraints in Linear Bandits

Jun 10, 2022
David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

Figure 1 for Interactively Learning Preference Constraints in Linear Bandits

Figure 2 for Interactively Learning Preference Constraints in Linear Bandits

Figure 3 for Interactively Learning Preference Constraints in Linear Bandits

Figure 4 for Interactively Learning Preference Constraints in Linear Bandits

We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we consider learning constraints to represent human preferences in a driving simulation. ACOL is significantly more sample efficient than alternatives for this application. Further, we find that learning preferences as constraints is more robust to changes in the driving scenario than encoding the preferences directly in the reward function.

* Accepted to International Conference on Machine Learning (ICML), 2022

Via

Access Paper or Ask Questions

Active Bayesian Causal Inference

Jun 04, 2022
Christian Toth, Lars Lorch, Christian Knoll, Andreas Krause, Franz Pernkopf, Robert Peharz, Julius von Kügelgen

Figure 1 for Active Bayesian Causal Inference

Figure 2 for Active Bayesian Causal Inference

Figure 3 for Active Bayesian Causal Inference

Figure 4 for Active Bayesian Causal Inference

Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.

* RP & JvK are shared last authors. 10 pages + references + appendices (26 pages total); 6 Figs

Via

Access Paper or Ask Questions