Alert button
Picture for Tobias Sutter

Tobias Sutter

Alert button

End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

Jun 11, 2023
Yves Rychener, Daniel Kuhn, Tobias Sutter

Figure 1 for End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
Figure 2 for End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
Figure 3 for End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
Figure 4 for End-to-End Learning for Stochastic Optimization: A Bayesian Perspective

We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this analysis, we then propose new end-to-end learning algorithms for training decision maps that output solutions of empirical risk minimization and distributionally robust optimization problems, two dominant modeling paradigms in optimization under uncertainty. Numerical results for a synthetic newsvendor problem illustrate the key differences between alternative training schemes. We also investigate an economic dispatch problem based on real data to showcase the impact of the neural network architecture of the decision maps on their test performance.

* Accepted at ICML 2023 
Viaarxiv icon

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

May 31, 2023
Mengmeng Li, Tobias Sutter, Daniel Kuhn

Figure 1 for Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets
Figure 2 for Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose a policy gradient algorithm for robust infinite-horizon Markov Decision Processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. This prompts us to develop a projected Langevin dynamics algorithm tailored to the robust policy evaluation problem, which offers global optimality guarantees. We also propose a deterministic policy gradient method that solves the robust policy evaluation problem approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Numerical experiments showcase that our projected Langevin dynamics algorithm can escape local optima, while algorithms tailored to rectangular uncertainty fail to do so.

* 20 pages, 2 figures 
Viaarxiv icon

Optimal Learning via Moderate Deviations Theory

May 23, 2023
Arnab Ganguly, Tobias Sutter

This paper proposes a statistically optimal approach for learning a function value using a confidence interval in a wide range of models, including general non-parametric estimation of an expected loss described as a stochastic programming problem or various SDE models. More precisely, we develop a systematic construction of highly accurate confidence intervals by using a moderate deviation principle-based approach. It is shown that the proposed confidence intervals are statistically optimal in the sense that they satisfy criteria regarding exponential accuracy, minimality, consistency, mischaracterization probability, and eventual uniformly most accurate (UMA) property. The confidence intervals suggested by this approach are expressed as solutions to robust optimization problems, where the uncertainty is expressed via the underlying moderate deviation rate function induced by the data-generating process. We demonstrate that for many models these optimization problems admit tractable reformulations as finite convex programs even when they are infinite-dimensional.

* 35 pages, 3 figures 
Viaarxiv icon

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

May 01, 2023
Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen

Figure 1 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 2 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 3 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Figure 4 for ISAAC Newton: Input-based Approximate Curvature for Newton's Method

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods.

* Published at ICLR 2023, Code @ https://github.com/Felix-Petersen/isaac, Video @ https://youtu.be/7RKRX-MdwqM 
Viaarxiv icon

A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks

Jan 26, 2023
David Boetius, Stefan Leue, Tobias Sutter

Figure 1 for A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks
Figure 2 for A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks
Figure 3 for A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks
Figure 4 for A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks

Counterexample-guided repair aims at creating neural networks with mathematical safety guarantees, facilitating the application of neural networks in safety-critical domains. However, whether counterexample-guided repair is guaranteed to terminate remains an open question. We approach this question by showing that counterexample-guided repair can be viewed as a robust optimisation algorithm. While termination guarantees for neural network repair itself remain beyond our reach, we prove termination for more restrained machine learning models and disprove termination in a general setting. We empirically study the practical implications of our theoretical results, demonstrating the suitability of common verifiers and falsifiers for repair despite a disadvantageous theoretical result. Additionally, we use our theoretical insights to devise a novel algorithm for repairing linear regression models, surpassing existing approaches.

* 22 pages + 9 pages references and appendix, 4 figures 
Viaarxiv icon

Distributionally Robust Optimization with Markovian Data

Jun 12, 2021
Mengmeng Li, Tobias Sutter, Daniel Kuhn

Figure 1 for Distributionally Robust Optimization with Markovian Data
Figure 2 for Distributionally Robust Optimization with Markovian Data
Figure 3 for Distributionally Robust Optimization with Markovian Data
Figure 4 for Distributionally Robust Optimization with Markovian Data

We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with $d$ states. We propose a data-driven distributionally robust optimization model to estimate the problem's objective function and optimal solution. By leveraging results from large deviations theory, we derive statistical guarantees on the quality of these estimators. The underlying worst-case expectation problem is nonconvex and involves $\mathcal O(d^2)$ decision variables. Thus, it cannot be solved efficiently for large $d$. By exploiting the structure of this problem, we devise a customized Frank-Wolfe algorithm with convex direction-finding subproblems of size $\mathcal O(d)$. We prove that this algorithm finds a stationary point efficiently under mild conditions. The efficiency of the method is predicated on a dimensionality reduction enabled by a dual reformulation. Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods.

* 20 pages 
Viaarxiv icon

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Jun 08, 2021
Tobias Sutter, Andreas Krause, Daniel Kuhn

Figure 1 for Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Figure 2 for Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Figure 3 for Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Figure 4 for Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.

* 23 pages, 4 figures 
Viaarxiv icon

Generalized maximum entropy estimation

Aug 24, 2017
Tobias Sutter, David Sutter, Peyman Mohajerin Esfahani, John Lygeros

Figure 1 for Generalized maximum entropy estimation
Figure 2 for Generalized maximum entropy estimation

We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zero-information moment closure method.

* 16 pages, 2 figures 
Viaarxiv icon

A variational approach to path estimation and parameter inference of hidden diffusion processes

Oct 25, 2016
Tobias Sutter, Arnab Ganguly, Heinz Koeppl

Figure 1 for A variational approach to path estimation and parameter inference of hidden diffusion processes
Figure 2 for A variational approach to path estimation and parameter inference of hidden diffusion processes
Figure 3 for A variational approach to path estimation and parameter inference of hidden diffusion processes

We consider a hidden Markov model, where the signal process, given by a diffusion, is only indirectly observed through some noisy measurements. The article develops a variational method for approximating the hidden states of the signal process given the full set of observations. This, in particular, leads to systematic approximations of the smoothing densities of the signal process. The paper then demonstrates how an efficient inference scheme, based on this variational approach to the approximation of the hidden states, can be designed to estimate the unknown parameters of stochastic differential equations. Two examples at the end illustrate the efficacy and the accuracy of the presented method.

* JMLR, volume 17, number 190, year 2016  
* 37 pages, 2 figures, revised 
Viaarxiv icon