Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kim E. Jelfs

The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning

Jun 09, 2025

Toby Boyne, Juan S. Campos, Becky D. Langdon, Jixiang Qing, Yilin Xie, Shiqiang Zhang, Calvin Tsay, Ruth Misener, Daniel W. Davies, Kim E. Jelfs(+4 more)

Abstract:Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.

Via

Access Paper or Ask Questions

Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations

Sep 11, 2024

Edmund Judge, Mohammed Azzouzi, Austin M. Mroz, Antonio del Rio Chanona, Kim E. Jelfs

Figure 1 for Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations

Figure 2 for Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations

Figure 3 for Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations

Figure 4 for Applying Multi-Fidelity Bayesian Optimization in Chemistry: Open Challenges and Major Considerations

Abstract:Multi fidelity Bayesian optimization (MFBO) leverages experimental and or computational data of varying quality and resource cost to optimize towards desired maxima cost effectively. This approach is particularly attractive for chemical discovery due to MFBO's ability to integrate diverse data sources. Here, we investigate the application of MFBO to accelerate the identification of promising molecules or materials. We specifically analyze the conditions under which lower fidelity data can enhance performance compared to single-fidelity problem formulations. We address two key challenges, selecting the optimal acquisition function, understanding the impact of cost, and data fidelity correlation. We then discuss how to assess the effectiveness of MFBO for chemical discovery.

Via

Access Paper or Ask Questions

PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

Aug 14, 2024

Jiajun Zhou, Yijie Yang, Austin M. Mroz, Kim E. Jelfs

Figure 1 for PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

Figure 2 for PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

Figure 3 for PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

Figure 4 for PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

Abstract:Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers via machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here, we present a self-supervised contrastive learning paradigm, PolyCL, for learning high-quality polymer representation without the need for labels. Our model combines explicit and implicit augmentation strategies for improved learning performance. The results demonstrate that our model achieves either better, or highly competitive, performances on transfer learning tasks as a feature extractor without an overcomplicated training strategy or hyperparameter optimisation. Further enhancing the efficacy of our model, we conducted extensive analyses on various augmentation combinations used in contrastive learning. This led to identifying the most effective combination to maximise PolyCL's performance.

Via

Access Paper or Ask Questions