Alert button
Picture for Harald Oberhauser

Harald Oberhauser

Alert button

Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature

Jun 09, 2023
Masaki Adachi, Satoshi Hayakawa, Xingchen Wan, Martin Jørgensen, Harald Oberhauser, Michael A. Osborne

Figure 1 for Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature
Figure 2 for Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature
Figure 3 for Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature
Figure 4 for Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature

Real-world optimisation problems often feature complex combinations of (1) diverse constraints, (2) discrete and mixed spaces, and are (3) highly parallelisable. (4) There are also cases where the objective function cannot be queried if unknown constraints are not satisfied, e.g. in drug discovery, safety on animal experiments (unknown constraints) must be established before human clinical trials (querying objective function) may proceed. However, most existing works target each of the above three problems in isolation and do not consider (4) unknown constraints with query rejection. For problems with diverse constraints and/or unconventional input spaces, it is difficult to apply these techniques as they are often mutually incompatible. We propose cSOBER, a domain-agnostic prudent parallel active sampler for Bayesian optimisation, based on SOBER of Adachi et al. (2023). We consider infeasibility under unknown constraints as a type of integration error that we can estimate. We propose a theoretically-driven approach that propagates such error as a tolerance in the quadrature precision that automatically balances exploitation and exploration with the expected rejection rate. Moreover, our method flexibly accommodates diverse constraints and/or discrete and mixed spaces via adaptive tolerance, including conventional zero-risk cases. We show that cSOBER outperforms competitive baselines on diverse real-world blackbox-constrained problems, including safety-constrained drug discovery, and human-relationship-aware team optimisation over graph-structured space.

* 24 pages, 5 figures 
Viaarxiv icon

The Signature Kernel

May 08, 2023
Darrick Lee, Harald Oberhauser

Figure 1 for The Signature Kernel
Figure 2 for The Signature Kernel
Figure 3 for The Signature Kernel

The signature kernel is a positive definite kernel for sequential data. It inherits theoretical guarantees from stochastic analysis, has efficient algorithms for computation, and shows strong empirical performance. In this short survey paper for a forthcoming Springer handbook, we give an elementary introduction to the signature kernel and highlight these theoretical and computational properties.

* 31 pages, 2 figures 
Viaarxiv icon

SOBER: Scalable Batch Bayesian Optimization and Quadrature using Recombination Constraints

Jan 30, 2023
Masaki Adachi, Satoshi Hayakawa, Saad Hamid, Martin Jørgensen, Harald Oberhauser, Micheal A. Osborne

Figure 1 for SOBER: Scalable Batch Bayesian Optimization and Quadrature using Recombination Constraints
Figure 2 for SOBER: Scalable Batch Bayesian Optimization and Quadrature using Recombination Constraints
Figure 3 for SOBER: Scalable Batch Bayesian Optimization and Quadrature using Recombination Constraints
Figure 4 for SOBER: Scalable Batch Bayesian Optimization and Quadrature using Recombination Constraints

Batch Bayesian optimisation (BO) has shown to be a sample-efficient method of performing optimisation where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch BO with arbitrary acquisition functions, arbitrary input spaces (e.g. graph), and arbitrary kernels. The key to our approach is to reformulate batch selection for BO as a Bayesian quadrature (BQ) problem, which offers computational advantages. This reformulation is beneficial in solving BQ tasks reciprocally, which introduces the exploitative functionality of BO to BQ. We show that SOBER offers substantive performance gains in synthetic and real-world tasks, including drug discovery and simulation-based inference.

* 24 pages, 9 figures 
Viaarxiv icon

Kernelized Cumulants: Beyond Kernel Mean Embeddings

Jan 29, 2023
Patric Bonnier, Harald Oberhauser, Zoltán Szabó

Figure 1 for Kernelized Cumulants: Beyond Kernel Mean Embeddings
Figure 2 for Kernelized Cumulants: Beyond Kernel Mean Embeddings
Figure 3 for Kernelized Cumulants: Beyond Kernel Mean Embeddings
Figure 4 for Kernelized Cumulants: Beyond Kernel Mean Embeddings

In $\mathbb R^d$, it is well-known that cumulants provide an alternative to moments that can achieve the same goals with numerous benefits such as lower variance estimators. In this paper we extend cumulants to reproducing kernel Hilbert spaces (RKHS) using tools from tensor algebras and show that they are computationally tractable by a kernel trick. These kernelized cumulants provide a new set of all-purpose statistics; the classical maximum mean discrepancy and Hilbert-Schmidt independence criterion arise as the degree one objects in our general construction. We argue both theoretically and empirically (on synthetic, environmental, and traffic data analysis) that going beyond degree one has several advantages and can be achieved with the same computational complexity and minimal overhead in our experiments.

* 19 pages, 8 figures 
Viaarxiv icon

Sampling-based Nyström Approximation and Kernel Quadrature

Jan 23, 2023
Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Figure 1 for Sampling-based Nyström Approximation and Kernel Quadrature
Figure 2 for Sampling-based Nyström Approximation and Kernel Quadrature

We analyze the Nystr\"om approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nystr\"om approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nystr\"om approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations.

* 27 pages 
Viaarxiv icon

Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination

Jun 09, 2022
Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Harald Oberhauser, Michael A. Osborne

Figure 1 for Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
Figure 2 for Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
Figure 3 for Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
Figure 4 for Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination

Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses a provably-exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics.

* 28 pages, 4 figures 
Viaarxiv icon

Capturing Graphs with Hypo-Elliptic Diffusions

May 27, 2022
Csaba Toth, Darrick Lee, Celia Hacker, Harald Oberhauser

Figure 1 for Capturing Graphs with Hypo-Elliptic Diffusions
Figure 2 for Capturing Graphs with Hypo-Elliptic Diffusions
Figure 3 for Capturing Graphs with Hypo-Elliptic Diffusions
Figure 4 for Capturing Graphs with Hypo-Elliptic Diffusions

Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms. In particular, this gives a structured approach to capture long-range dependencies on graphs that is robust to pooling. Besides the attractive theoretical properties, our experiments show that this method competes with graph transformers on datasets requiring long-range reasoning but scales only linearly in the number of edges as opposed to quadratically in nodes.

* 30 pages 
Viaarxiv icon

Tangent Space and Dimension Estimation with the Wasserstein Distance

Oct 12, 2021
Uzu Lim, Vidit Nanda, Harald Oberhauser

Figure 1 for Tangent Space and Dimension Estimation with the Wasserstein Distance
Figure 2 for Tangent Space and Dimension Estimation with the Wasserstein Distance
Figure 3 for Tangent Space and Dimension Estimation with the Wasserstein Distance
Figure 4 for Tangent Space and Dimension Estimation with the Wasserstein Distance

We provide explicit bounds on the number of sample points required to estimate tangent spaces and intrinsic dimensions of (smooth, compact) Euclidean submanifolds via local principal component analysis. Our approach directly estimates covariance matrices locally, which simultaneously allows estimating both the tangent spaces and the intrinsic dimension of a manifold. The key arguments involve a matrix concentration inequality, a Wasserstein bound for flattening a manifold, and a Lipschitz relation for the covariance matrix with respect to the Wasserstein distance.

Viaarxiv icon

Positively Weighted Kernel Quadrature via Subsampling

Jul 20, 2021
Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Figure 1 for Positively Weighted Kernel Quadrature via Subsampling
Figure 2 for Positively Weighted Kernel Quadrature via Subsampling
Figure 3 for Positively Weighted Kernel Quadrature via Subsampling
Figure 4 for Positively Weighted Kernel Quadrature via Subsampling

We study kernel quadrature rules with positive weights for probability measures on general domains. Our theoretical analysis combines the spectral properties of the kernel with random sampling of points. This results in effective algorithms to construct kernel quadrature rules with positive weights and small worst-case error. Besides additional robustness, our numerical experiments indicate that this can achieve fast convergence rates that compete with the optimal bounds in well-known examples.

* 19 pages 
Viaarxiv icon

Neural SDEs as Infinite-Dimensional GANs

Feb 06, 2021
Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons

Figure 1 for Neural SDEs as Infinite-Dimensional GANs
Figure 2 for Neural SDEs as Infinite-Dimensional GANs
Figure 3 for Neural SDEs as Infinite-Dimensional GANs
Figure 4 for Neural SDEs as Infinite-Dimensional GANs

Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in doing so the neural and classical regimes may be brought together. The input noise is Brownian motion, the output samples are time-evolving paths produced by a numerical solver, and by parameterising a discriminator as a Neural Controlled Differential Equation (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time generative time series models. Unlike previous work on this problem, this is a direct extension of the classical approach without reference to either prespecified statistics or density functions. Arbitrary drift and diffusions are admissible, so as the Wasserstein loss has a unique global minima, in the infinite data limit \textit{any} SDE may be learnt.

Viaarxiv icon