Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bharath Ramsundar

Score-Based Generative Models for Molecule Generation

Mar 07, 2022

Dwaraknath Gnaneshwar, Bharath Ramsundar, Dhairya Gandhi, Rachel Kurchin, Venkatasubramanian Viswanathan

Figure 1 for Score-Based Generative Models for Molecule Generation

Figure 2 for Score-Based Generative Models for Molecule Generation

Figure 3 for Score-Based Generative Models for Molecule Generation

Abstract:Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probability density using a score function approximation, as opposed to modelling the density function directly, and sampling from it using annealed Langevin Dynamics. We believe that score-based generative models could open up new opportunities in molecule generation due to their architectural flexibility, such as replacing the score function with an SE(3) equivariant model. In this work, we lay the foundations by testing the efficacy of score-based models for molecule generation. We train a Transformer-based score function on Self-Referencing Embedded Strings (SELFIES) representations of 1.5 million samples from the ZINC dataset and use the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.

Via

Access Paper or Ask Questions

FastFlows: Flow-Based Models for Molecular Graph Generation

Jan 28, 2022

Nathan C. Frey, Vijay Gadepally, Bharath Ramsundar

Abstract:We propose a framework using normalizing-flow based models, SELF-Referencing Embedded Strings, and multi-objective optimization that efficiently generates small molecules. With an initial training set of only 100 small molecules, FastFlows generates thousands of chemically valid molecules in seconds. Because of the efficient sampling, substructure filters can be applied as desired to eliminate compounds with unreasonable moieties. Using easily computable and learned metrics for druglikeness, synthetic accessibility, and synthetic complexity, we perform a multi-objective optimization to demonstrate how FastFlows functions in a high-throughput virtual screening context. Our model is significantly simpler and easier to train than autoregressive molecular generative models, and enables fast generation and identification of druglike, synthesizable molecules.

* 7 pages, 4 figures, ELLIS Machine Learning for Molecule Discovery Workshop 2021

Via

Access Paper or Ask Questions

Bringing Atomistic Deep Learning to Prime Time

Dec 09, 2021

Nathan C. Frey, Siddharth Samsi, Bharath Ramsundar, Connor W. Coley, Vijay Gadepally

Figure 1 for Bringing Atomistic Deep Learning to Prime Time

Abstract:Artificial intelligence has not yet revolutionized the design of materials and molecules. In this perspective, we identify four barriers preventing the integration of atomistic deep learning, molecular science, and high-performance computing. We outline focused research efforts to address the opportunities presented by these challenges.

* 6 pages, 1 figure, NeurIPS 2021 AI for Science workshop

Via

Access Paper or Ask Questions

Differentiable Physics: A Position Piece

Sep 14, 2021

Bharath Ramsundar, Dilip Krishnamurthy, Venkatasubramanian Viswanathan

Figure 1 for Differentiable Physics: A Position Piece

Abstract:Differentiable physics provides a new approach for modeling and understanding the physical systems by pairing the new technology of differentiable programming with classical numerical methods for physical simulation. We survey the rapidly growing literature of differentiable physics techniques and highlight methods for parameter estimation, learning representations, solving differential equations, and developing what we call scientific foundation models using data and inductive priors. We argue that differentiable physics offers a new paradigm for modeling physical phenomena by combining classical analytic solutions with numerical methodology using the bridge of differentiable programming.

* 12 pages, 1 figure

Via

Access Paper or Ask Questions

ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

Oct 23, 2020

Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar

Figure 1 for ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

Figure 2 for ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

Figure 3 for ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction

Abstract:GNNs and chemical fingerprints are the predominant approaches to representing molecules for property prediction. However, in NLP, transformers have become the de-facto standard for representation learning thanks to their strong downstream task transfer. In parallel, the software ecosystem around transformers is maturing rapidly, with libraries like HuggingFace and BertViz enabling streamlined training and introspection. In this work, we make one of the first attempts to systematically evaluate transformers on molecular property prediction tasks via our ChemBERTa model. ChemBERTa scales well with pretraining dataset size, offering competitive downstream performance on MoleculeNet and useful attention-based visualization modalities. Our results suggest that transformers offer a promising avenue of future work for molecular representation learning and property prediction. To facilitate these efforts, we release a curated dataset of 77M SMILES from PubChem suitable for large-scale self-supervised pretraining.

* Submitted to NeurIPS 2020 ML for Molecules Workshop

Via

Access Paper or Ask Questions

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Nov 14, 2019

Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson(+2 more)

Figure 1 for AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Figure 2 for AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Figure 3 for AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Figure 4 for AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Abstract:One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of machine learning and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical datasets covering a wide range of parameters. As a result of these comprehensive experiments, we have found that physicochemical descriptors and deep learning-based graph representations significantly outperform traditional fingerprints in the characterization of molecular features. We have also found that dataset size is directly correlated to prediction performance, and that single-task deep learning models only outperform shallow learners if there is sufficient data. Likewise, dataset size has a direct impact on model predictivity, independent of comprehensive hyperparameter model tuning. Our findings point to the need for public dataset integration or multi-task/transfer learning approaches. Lastly, we found that uncertainty quantification (UQ) analysis may help identify model error; however, efficacy of UQ to filter predictions varies considerably between datasets and featurization/model types. AMPL is open source and available for download at http://github.com/ATOMconsortium/AMPL.

Via

Access Paper or Ask Questions

MoleculeNet: A Benchmark for Molecular Machine Learning

Oct 26, 2018

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande

Figure 1 for MoleculeNet: A Benchmark for Molecular Machine Learning

Figure 2 for MoleculeNet: A Benchmark for Molecular Machine Learning

Figure 3 for MoleculeNet: A Benchmark for Molecular Machine Learning

Figure 4 for MoleculeNet: A Benchmark for Molecular Machine Learning

Abstract:Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Via

Access Paper or Ask Questions

PotentialNet for Molecular Property Prediction

Oct 22, 2018

Evan N. Feinberg, Debnil Sur, Zhenqin Wu, Brooke E. Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, Vijay S. Pande

Figure 1 for PotentialNet for Molecular Property Prediction

Figure 2 for PotentialNet for Molecular Property Prediction

Figure 3 for PotentialNet for Molecular Property Prediction

Figure 4 for PotentialNet for Molecular Property Prediction

Abstract:The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor $EF_\chi^{(R)}$, to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.

* 13 pages, 5 figures, 8 tables

Via

Access Paper or Ask Questions

Retrosynthetic reaction prediction using neural sequence-to-sequence models

Jun 06, 2017

Bowen Liu, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, Vijay Pande

Figure 1 for Retrosynthetic reaction prediction using neural sequence-to-sequence models

Figure 2 for Retrosynthetic reaction prediction using neural sequence-to-sequence models

Figure 3 for Retrosynthetic reaction prediction using neural sequence-to-sequence models

Figure 4 for Retrosynthetic reaction prediction using neural sequence-to-sequence models

Abstract:We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis.

Via

Access Paper or Ask Questions

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Mar 30, 2017

Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande

Figure 1 for Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Figure 2 for Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Figure 3 for Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Figure 4 for Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Abstract:Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.

Via

Access Paper or Ask Questions