Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruth Misener

The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning

Jun 09, 2025

Toby Boyne, Juan S. Campos, Becky D. Langdon, Jixiang Qing, Yilin Xie, Shiqiang Zhang, Calvin Tsay, Ruth Misener, Daniel W. Davies, Kim E. Jelfs(+4 more)

Abstract:Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.

Via

Access Paper or Ask Questions

Global optimization of graph acquisition functions for neural architecture search

May 29, 2025

Yilin Xie, Shiqiang Zhang, Jixiang Qing, Ruth Misener, Calvin Tsay

Abstract:Graph Bayesian optimization (BO) has shown potential as a powerful and data-efficient tool for neural architecture search (NAS). Most existing graph BO works focus on developing graph surrogates models, i.e., metrics of networks and/or different kernels to quantify the similarity between networks. However, the acquisition optimization, as a discrete optimization task over graph structures, is not well studied due to the complexity of formulating the graph search space and acquisition functions. This paper presents explicit optimization formulations for graph input space including properties such as reachability and shortest paths, which are used later to formulate graph kernels and the acquisition function. We theoretically prove that the proposed encoding is an equivalent representation of the graph space and provide restrictions for the NAS domain with either node or edge labels. Numerical results over several NAS benchmarks show that our method efficiently finds the optimal architecture for most cases, highlighting its efficacy.

* 19 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Mar 07, 2025

Toby Boyne, Jose Pablo Folch, Robert M Lee, Behrang Shafei, Ruth Misener

Figure 1 for BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Figure 2 for BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Figure 3 for BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Figure 4 for BARK: A Fully Bayesian Tree Kernel for Black-box Optimization

Abstract:We perform Bayesian optimization using a Gaussian process perspective on Bayesian Additive Regression Trees (BART). Our BART Kernel (BARK) uses tree agreement to define a posterior over piecewise-constant functions, and we explore the space of tree kernels using a Markov chain Monte Carlo approach. Where BART only samples functions, the resulting BARK model obtains samples of Gaussian processes defining distributions over functions, which allow us to build acquisition functions for Bayesian optimization. Our tree-based approach enables global optimization over the surrogate, even for mixed-feature spaces. Moreover, where many previous tree-based kernels provide uncertainty quantification over function values, our sampling scheme captures uncertainty over the tree structure itself. Our experiments show the strong performance of BARK on both synthetic and applied benchmarks, due to the combination of our fully Bayesian surrogate and the optimization procedure.

* 8 main pages, 22 total pages, 10 figures, 6 tables

Via

Access Paper or Ask Questions

BoFire: Bayesian Optimization Framework Intended for Real Experiments

Aug 09, 2024

Johannes P. Dürholt, Thomas S. Asche, Johanna Kleinekorte, Gabriel Mancino-Ball, Benjamin Schiller, Simon Sung, Julian Keupp, Aaron Osburg, Toby Boyne, Ruth Misener(+8 more)

Figure 1 for BoFire: Bayesian Optimization Framework Intended for Real Experiments

Abstract:Our open-source Python package BoFire combines Bayesian Optimization (BO) with other design of experiments (DoE) strategies focusing on developing and optimizing new chemistry. Previous BO implementations, for example as they exist in the literature or software, require substantial adaptation for effective real-world deployment in chemical industry. BoFire provides a rich feature-set with extensive configurability and realizes our vision of fast-tracking research contributions into industrial use via maintainable open-source software. Owing to quality-of-life features like JSON-serializability of problem formulations, BoFire enables seamless integration of BO into RESTful APIs, a common architecture component for both self-driving laboratories and human-in-the-loop setups. This paper discusses the differences between BoFire and other BO implementations and outlines ways that BO research needs to be adapted for real-world use in a chemistry setting.

* 6 pages, 1 figure, 1 listing

Via

Access Paper or Ask Questions

System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Jun 04, 2024

Jixiang Qing, Becky D Langdon, Robert M Lee, Behrang Shafei, Mark van der Wilk, Calvin Tsay, Ruth Misener

Figure 1 for System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Figure 2 for System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Figure 3 for System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Figure 4 for System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

Abstract:We consider the problem of optimizing initial conditions and timing in dynamical systems governed by unknown ordinary differential equations (ODEs), where evaluating different initial conditions is costly and there are constraints on observation times. To identify the optimal conditions within several trials, we introduce a few-shot Bayesian Optimization (BO) framework based on the system's prior information. At the core of our approach is the System-Aware Neural ODE Processes (SANODEP), an extension of Neural ODE Processes (NODEP) designed to meta-learn ODE systems from multiple trajectories using a novel context embedding block. Additionally, we propose a multi-scenario loss function specifically for optimization purposes. Our two-stage BO framework effectively incorporates search space constraints, enabling efficient optimization of both initial conditions and observation timings. We conduct extensive experiments showcasing SANODEP's potential for few-shot BO. We also explore SANODEP's adaptability to varying levels of prior information, highlighting the trade-off between prior flexibility and model fitting accuracy.

Via

Access Paper or Ask Questions

Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Feb 27, 2024

Ruby Sedgwick, John P. Goertz, Molly M. Stevens, Ruth Misener, Mark van der Wilk

Figure 1 for Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Figure 2 for Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Figure 3 for Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Figure 4 for Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

Abstract:With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this development feasible. By combining a transfer learning surrogate model with Bayesian optimization, we show how the total number of experiments can be reduced by sharing information between optimization tasks. We demonstrate the reduction in the number of experiments using data from the development of DNA competitors for use in an amplification-based diagnostic assay. We use cross-validation to compare the predictive accuracy of different transfer learning models, and then compare the performance of the models for both single objective and penalized optimization tasks.

Via

Access Paper or Ask Questions

Verifying message-passing neural networks via topology-based bounds tightening

Feb 21, 2024

Christopher Hojny, Shiqiang Zhang, Juan S. Campos, Ruth Misener

Figure 1 for Verifying message-passing neural networks via topology-based bounds tightening

Figure 2 for Verifying message-passing neural networks via topology-based bounds tightening

Figure 3 for Verifying message-passing neural networks via topology-based bounds tightening

Figure 4 for Verifying message-passing neural networks via topology-based bounds tightening

Abstract:Since graph neural networks (GNNs) are often vulnerable to attack, we need to know when we can trust them. We develop a computationally effective approach towards providing robust certificates for message-passing neural networks (MPNNs) using a Rectified Linear Unit (ReLU) activation function. Because our work builds on mixed-integer optimization, it encodes a wide variety of subproblems, for example it admits (i) both adding and removing edges, (ii) both global and local budgets, and (iii) both topological perturbations and feature modifications. Our key technology, topology-based bounds tightening, uses graph structure to tighten bounds. We also experiment with aggressive bounds tightening to dynamically change the optimization constraints by tightening variable bounds. To demonstrate the effectiveness of these strategies, we implement an extension to the open-source branch-and-cut solver SCIP. We test on both node and graph classification problems and consider topological attacks that both add and remove edges.

Via

Access Paper or Ask Questions

Mixed-Output Gaussian Process Latent Variable Models

Feb 14, 2024

James Odgers, Chrysoula Kappatou, Ruth Misener, Sarah Filippi

Figure 1 for Mixed-Output Gaussian Process Latent Variable Models

Figure 2 for Mixed-Output Gaussian Process Latent Variable Models

Figure 3 for Mixed-Output Gaussian Process Latent Variable Models

Figure 4 for Mixed-Output Gaussian Process Latent Variable Models

Abstract:This work develops a Bayesian non-parametric approach to signal separation where the signals may vary according to latent variables. Our key contribution is to augment Gaussian Process Latent Variable Models (GPLVMs) to incorporate the case where each data point comprises the weighted sum of a known number of pure component signals, observed across several input locations. Our framework allows the use of a range of priors for the weights of each observation. This flexibility enables us to represent use cases including sum-to-one constraints for estimating fractional makeup, and binary weights for classification. Our contributions are particularly relevant to spectroscopy, where changing conditions may cause the underlying pure component signals to vary from sample to sample. To demonstrate the applicability to both spectroscopy and other domains, we consider several applications: a near-infrared spectroscopy data set with varying temperatures, a simulated data set for identifying flow configuration through a pipe, and a data set for determining the type of rock from its reflectance.

Via

Access Paper or Ask Questions

Transition Constrained Bayesian Optimization via Markov Decision Processes

Feb 13, 2024

Jose Pablo Folch, Calvin Tsay, Robert M Lee, Behrang Shafei, Weronika Ormaniec, Andreas Krause, Mark van der Wilk, Ruth Misener, Mojmír Mutný

Figure 1 for Transition Constrained Bayesian Optimization via Markov Decision Processes

Figure 2 for Transition Constrained Bayesian Optimization via Markov Decision Processes

Figure 3 for Transition Constrained Bayesian Optimization via Markov Decision Processes

Figure 4 for Transition Constrained Bayesian Optimization via Markov Decision Processes

Abstract:Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends Bayesian optimization via the framework of Markov Decision Processes, iteratively solving a tractable linearization of our objective using reinforcement learning to obtain a policy that plans ahead over long horizons. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.

* 9 pages main, 24 pages total, 13 figures, 1 table, preprint

Via

Access Paper or Ask Questions

Practical Path-based Bayesian Optimization

Dec 01, 2023

Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

Figure 1 for Practical Path-based Bayesian Optimization

Figure 2 for Practical Path-based Bayesian Optimization

Figure 3 for Practical Path-based Bayesian Optimization

Figure 4 for Practical Path-based Bayesian Optimization

Abstract:There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itself, and (b) the cost of changing the input parameters. In this short paper, we extend the SnAKe algorithm to deal with both types of costs simultaneously. We further propose extensions to the case of a maximum allowable input change, as well as to the multi-objective setting.

* NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World
* 6 main pages, 12 with references and appendix. 4 figures, 2 tables. To appear in NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

Via

Access Paper or Ask Questions