Abstract:Developing bioactive molecules remains a central, time- and cost-heavy challenge in drug discovery, particularly for novel targets lacking structural or functional data. Pharmacophore modeling presents an alternative for capturing the key features required for molecular bioactivity against a biological target. In this work, we present PharmaDiff, a pharmacophore-conditioned diffusion model for 3D molecular generation. PharmaDiff employs a transformer-based architecture to integrate an atom-based representation of the 3D pharmacophore into the generative process, enabling the precise generation of 3D molecular graphs that align with predefined pharmacophore hypotheses. Through comprehensive testing, PharmaDiff demonstrates superior performance in matching 3D pharmacophore constraints compared to ligand-based drug design methods. Additionally, it achieves higher docking scores across a range of proteins in structure-based drug design, without the need for target protein structures. By integrating pharmacophore modeling with 3D generative techniques, PharmaDiff offers a powerful and flexible framework for rational drug design.
Abstract:Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization.
Abstract:Diffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. Inspired by non-equilibrium statistical physics, these models can generate 3D molecular structures with specific properties or requirements crucial to drug discovery. Diffusion models were particularly successful at learning 3D molecular geometries' complex probability distributions and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation. It compares the performance, evaluation methods, and implementation details of various diffusion models used for molecular generation tasks. We cover strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges associated with generating stable 3D molecular structures. This review also explores the applications of diffusion models in $\textit{de novo}$ drug design and related areas of computational chemistry, such as structure-based drug design, including target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. We also cover conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review sheds light on their role in advancing drug discovery as well as their current limitations.