Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanyan Lan

Multi-level Interaction Modeling for Protein Mutational Effect Prediction

May 28, 2024

Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan

Figure 1 for Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Figure 2 for Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Figure 3 for Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Figure 4 for Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Abstract:Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different sidechain conformations, which lead to changes in the backbone conformation, eventually affecting the binding affinity between proteins. However, existing methods typically focus only on sidechain-level interaction modeling, resulting in suboptimal predictions. In this work, we propose a self-supervised multi-level pre-training framework, ProMIM, to fully capture all three levels of interactions with well-designed pretraining objectives. Experiments show ProMIM outperforms all the baselines on the standard benchmark, especially on mutations where significant changes in backbone conformations may occur. In addition, leading results from zero-shot evaluations for SARS-CoV-2 mutational effect prediction and antibody optimization underscore the potential of ProMIM as a powerful next-generation tool for developing novel therapeutic approaches and new drugs.

Via

Access Paper or Ask Questions

UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

May 15, 2024

Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Figure 1 for UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Figure 2 for UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Figure 3 for UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Figure 4 for UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Abstract:Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.

Via

Access Paper or Ask Questions

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Feb 21, 2024

Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIngjing Liu, Wei-Ying Ma, Yanyan Lan

Figure 1 for Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Figure 2 for Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Figure 3 for Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Figure 4 for Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Abstract:In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms within molecules, which brings insurmountable complexity to the masking-reconstruction paradigm. To tackle this challenge, we introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry. Specifically, REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature. We propose two pre-training objectives: Masked Reaction Centre Reconstruction (MRCR) and Reaction Centre Identification (RCI). REMO offers a novel solution to MRL by exploiting the underlying shared patterns in chemical reactions as \textit{context} for pre-training, which effectively infers meaningful representations of common chemistry knowledge. Such contextual representations can then be utilized to support diverse downstream molecular tasks with minimum finetuning, such as affinity prediction and drug-drug interaction prediction. Extensive experimental results on MoleculeACE, ACNet, drug-drug interaction (DDI), and reaction type classification show that across all tested downstream tasks, REMO outperforms the standard baseline of single-molecule masked modeling used in current MRL. Remarkably, REMO is the pioneering deep learning model surpassing fingerprint-based methods in activity cliff benchmarks.

* Preprint. Under Review

Via

Access Paper or Ask Questions

Equivariant Flow Matching with Hybrid Probability Transport

Dec 12, 2023

Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, Wei-Ying Ma

Figure 1 for Equivariant Flow Matching with Hybrid Probability Transport

Figure 2 for Equivariant Flow Matching with Hybrid Probability Transport

Figure 3 for Equivariant Flow Matching with Hybrid Probability Transport

Figure 4 for Equivariant Flow Matching with Hybrid Probability Transport

Abstract:The generation of 3D molecules requires simultaneously deciding the categorical features~(atom types) and continuous features~(atom coordinates). Deep generative models, especially Diffusion Models (DMs), have demonstrated effectiveness in generating feature-rich geometries. However, existing DMs typically suffer from unstable probability dynamics with inefficient sampling speed. In this paper, we introduce geometric flow matching, which enjoys the advantages of both equivariant modeling and stabilized probability dynamics. More specifically, we propose a hybrid probability path where the coordinates probability path is regularized by an equivariant optimal transport, and the information between different modalities is aligned. Experimentally, the proposed method could consistently achieve better performance on multiple molecule generation benchmarks with 4.75$\times$ speed up of sampling on average.

* NeurIPS 2023

Via

Access Paper or Ask Questions

Protein-ligand binding representation learning from fine-grained interactions

Nov 09, 2023

Shikun Feng, Minghao Li, Yinjun Jia, Weiying Ma, Yanyan Lan

Figure 1 for Protein-ligand binding representation learning from fine-grained interactions

Figure 2 for Protein-ligand binding representation learning from fine-grained interactions

Figure 3 for Protein-ligand binding representation learning from fine-grained interactions

Figure 4 for Protein-ligand binding representation learning from fine-grained interactions

Abstract:The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different from existing pre-training approaches which treat proteins and ligands individually, we emphasize to discern the intricate binding patterns from fine-grained interactions. Specifically, this self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure given a pocket and ligand with a Transformer based interaction module, which naturally emulates the binding process. To ensure the representation of rich binding information, we introduce two pre-training tasks, i.e.~atomic pairwise distance map prediction and mask ligand reconstruction, which comprehensively model the fine-grained interactions from both structure and feature space. Extensive experiments have demonstrated the superiority of our method across various binding tasks, including protein-ligand affinity prediction, virtual screening and protein-ligand docking.

Via

Access Paper or Ask Questions

Elastic Information Bottleneck

Nov 07, 2023

Yuyan Ni, Yanyan Lan, Ao Liu, Zhiming Ma

Abstract:Information bottleneck is an information-theoretic principle of representation learning that aims to learn a maximally compressed representation that preserves as much information about labels as possible. Under this principle, two different methods have been proposed, i.e., information bottleneck (IB) and deterministic information bottleneck (DIB), and have gained significant progress in explaining the representation mechanisms of deep learning algorithms. However, these theoretical and empirical successes are only valid with the assumption that training and test data are drawn from the same distribution, which is clearly not satisfied in many real-world applications. In this paper, we study their generalization abilities within a transfer learning scenario, where the target error could be decomposed into three components, i.e., source empirical error, source generalization gap (SG), and representation discrepancy (RD). Comparing IB and DIB on these terms, we prove that DIB's SG bound is tighter than IB's while DIB's RD is larger than IB's. Therefore, it is difficult to tell which one is better. To balance the trade-off between SG and the RD, we propose an elastic information bottleneck (EIB) to interpolate between the IB and DIB regularizers, which guarantees a Pareto frontier within the IB framework. Additionally, simulations and real data experiments show that EIB has the ability to achieve better domain adaptation results than IB and DIB, which validates the correctness of our theories.

Via

Access Paper or Ask Questions

Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Nov 03, 2023

Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

Figure 1 for Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Figure 2 for Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Figure 3 for Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Figure 4 for Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Abstract:While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.

Via

Access Paper or Ask Questions

Delta Score: Improving the Binding Assessment of Structure-Based Drug Design Methods

Nov 01, 2023

Minsi Ren, Bowen Gao, Bo Qiang, Yanyan Lan

Abstract:Structure-based drug design (SBDD) stands at the forefront of drug discovery, emphasizing the creation of molecules that target specific binding pockets. Recent advances in this area have witnessed the adoption of deep generative models and geometric deep learning techniques, modeling SBDD as a conditional generation task where the target structure serves as context. Historically, evaluation of these models centered on docking scores, which quantitatively depict the predicted binding affinity between a molecule and its target pocket. Though state-of-the-art models purport that a majority of their generated ligands exceed the docking score of ground truth ligands in test sets, it begs the question: Do these scores align with real-world biological needs? In this paper, we introduce the delta score, a novel evaluation metric grounded in tangible pharmaceutical requisites. Our experiments reveal that molecules produced by current deep generative models significantly lag behind ground truth reference ligands when assessed with the delta score. This novel metric not only complements existing benchmarks but also provides a pivotal direction for subsequent research in the domain.

Via

Access Paper or Ask Questions

UniMAP: Universal SMILES-Graph Representation Learning

Oct 22, 2023

Shikun Feng, Lixin Yang, Weiying Ma, Yanyan Lan

Figure 1 for UniMAP: Universal SMILES-Graph Representation Learning

Figure 2 for UniMAP: Universal SMILES-Graph Representation Learning

Figure 3 for UniMAP: Universal SMILES-Graph Representation Learning

Figure 4 for UniMAP: Universal SMILES-Graph Representation Learning

Abstract:Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration.

Via

Access Paper or Ask Questions

Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Oct 11, 2023

Bowen Gao, Yinjun Jia, Yuanle Mo, Yuyan Ni, Weiying Ma, Zhiming Ma, Yanyan Lan

Figure 1 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Figure 2 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Figure 3 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Figure 4 for Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

Abstract:Pocket representations play a vital role in various biomedical applications, such as druggability estimation, ligand affinity prediction, and de novo drug design. While existing geometric features and pretrained representations have demonstrated promising results, they usually treat pockets independent of ligands, neglecting the fundamental interactions between them. However, the limited pocket-ligand complex structures available in the PDB database (less than 100 thousand non-redundant pairs) hampers large-scale pretraining endeavors for interaction modeling. To address this constraint, we propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures, assisted by highly effective pretrained small molecule representations. By segmenting protein structures into drug-like fragments and their corresponding pockets, we obtain a reasonable simulation of ligand-receptor interactions, resulting in the generation of over 5 million complexes. Subsequently, the pocket encoder is trained in a contrastive manner to align with the representation of pseudo-ligand furnished by some pretrained small molecule encoders. Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction, pocket matching, and ligand binding affinity prediction. Notably, ProFSA surpasses other pretraining methods by a substantial margin. Moreover, our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.

Via

Access Paper or Ask Questions