Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Shee

FragmentRetro: A Quadratic Retrosynthetic Method Based on Fragmentation Algorithms

Sep 18, 2025

Yu Shee, Anthony M. Smaldone, Anton Morgunov, Gregory W. Kyro, Victor S. Batista

Abstract:Retrosynthesis, the process of deconstructing a target molecule into simpler precursors, is crucial for computer-aided synthesis planning (CASP). Widely adopted tree-search methods often suffer from exponential computational complexity. In this work, we introduce FragmentRetro, a novel retrosynthetic method that leverages fragmentation algorithms, specifically BRICS and r-BRICS, combined with stock-aware exploration and pattern fingerprint screening to achieve quadratic complexity. FragmentRetro recursively combines molecular fragments and verifies their presence in a building block set, providing sets of fragment combinations as retrosynthetic solutions. We present the first formal computational analysis of retrosynthetic methods, showing that tree search exhibits exponential complexity $O(b^h)$, DirectMultiStep scales as $O(h^6)$, and FragmentRetro achieves $O(h^2)$, where $h$ represents the number of heavy atoms in the target molecule and $b$ is the branching factor for tree search. Evaluations on PaRoutes, USPTO-190, and natural products demonstrate that FragmentRetro achieves high solved rates with competitive runtime, including cases where tree search fails. The method benefits from fingerprint screening, which significantly reduces substructure matching complexity. While FragmentRetro focuses on efficiently identifying fragment-based solutions rather than full reaction pathways, its computational advantages and ability to generate strategic starting candidates establish it as a powerful foundational component for scalable and automated synthesis planning.

Via

Access Paper or Ask Questions

DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

May 22, 2024

Yu Shee, Haote Li, Anton Morgunov, Victor Batista

Figure 1 for DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Figure 2 for DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Figure 3 for DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Figure 4 for DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis

Abstract:Traditional computer-aided synthesis planning (CASP) methods rely on iterative single-step predictions, leading to exponential search space growth that limits efficiency and scalability. We introduce a transformer-based model that directly generates multi-step synthetic routes as a single string by conditionally predicting each molecule based on all preceding ones. The model accommodates specific conditions such as the desired number of steps and starting materials, outperforming state-of-the-art methods on the PaRoutes dataset with a 2.2x improvement in Top-1 accuracy on the n$_1$ test set and a 3.3x improvement on the n$_5$ test set. It also successfully predicts routes for FDA-approved drugs not included in the training data, showcasing its generalization capabilities. While the current suboptimal diversity of the training set may impact performance on less common reaction types, our approach presents a promising direction towards fully automated retrosynthetic planning.

Via

Access Paper or Ask Questions

Kernel-Elastic Autoencoder for Molecular Design

Oct 12, 2023

Haote Li, Yu Shee, Brandon Allen, Federica Maschietto, Victor Batista

Figure 1 for Kernel-Elastic Autoencoder for Molecular Design

Figure 2 for Kernel-Elastic Autoencoder for Molecular Design

Figure 3 for Kernel-Elastic Autoencoder for Molecular Design

Figure 4 for Kernel-Elastic Autoencoder for Molecular Design

Abstract:We introduce the Kernel-Elastic Autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE is formulated based on two novel loss functions: modified maximum mean discrepancy and weighted reconstruction. KAE addresses the long-standing challenge of achieving valid generation and accurate reconstruction at the same time. KAE achieves remarkable diversity in molecule generation while maintaining near-perfect reconstructions on the independent testing dataset, surpassing previous molecule-generating models. KAE enables conditional generation and allows for decoding based on beam search resulting in state-of-the-art performance in constrained optimizations. Furthermore, KAE can generate molecules conditional to favorable binding affinities in docking applications as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, we anticipate KAE could be applied to solve problems by generation in a wide range of applications.

Via

Access Paper or Ask Questions