Alert button
Picture for Payel Das

Payel Das

Alert button

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

Sep 04, 2023
Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik

The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data. Code is available at https://github.com/gmh14/Geo-DEG.

* 22 pages, 10 figures; ICML 2023 
Viaarxiv icon

The Impact of Positional Encoding on Length Generalization in Transformers

May 31, 2023
Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy

Figure 1 for The Impact of Positional Encoding on Length Generalization in Transformers
Figure 2 for The Impact of Positional Encoding on Length Generalization in Transformers
Figure 3 for The Impact of Positional Encoding on Length Generalization in Transformers
Figure 4 for The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training context sizes to larger ones, is a critical challenge in the development of Transformer-based language models. Positional encoding (PE) has been identified as a major factor influencing length generalization, but the exact impact of different PE schemes on extrapolation in downstream tasks remains unclear. In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). Our evaluation encompasses a battery of reasoning and mathematical tasks. Our findings reveal that the most commonly used positional encoding methods, such as ALiBi, Rotary, and APE, are not well suited for length generalization in downstream tasks. More importantly, NoPE outperforms other explicit positional encoding methods while requiring no additional computation. We theoretically demonstrate that NoPE can represent both absolute and relative PEs, but when trained with SGD, it mostly resembles T5's relative PE attention patterns. Finally, we find that scratchpad is not always helpful to solve length generalization and its format highly impacts the model's performance. Overall, our work suggests that explicit position embeddings are not essential for decoder-only Transformers to generalize well to longer sequences.

Viaarxiv icon

Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models

May 22, 2023
Ioana Baldini, Chhavi Yadav, Payel Das, Kush R. Varshney

Figure 1 for Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models
Figure 2 for Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models
Figure 3 for Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models
Figure 4 for Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models

Auditing unwanted social bias in language models (LMs) is inherently hard due to the multidisciplinary nature of the work. In addition, the rapid evolution of LMs can make benchmarks irrelevant in no time. Bias auditing is further complicated by LM brittleness: when a presumably biased outcome is observed, is it due to model bias or model brittleness? We propose enlisting the models themselves to help construct bias auditing datasets that remain challenging, and introduce bias measures that distinguish between types of model errors. First, we extend an existing bias benchmark for NLI (BBNLI) using a combination of LM-generated lexical variations, adversarial filtering, and human validation. We demonstrate that the newly created dataset (BBNLInext) is more challenging than BBNLI: on average, BBNLI-next reduces the accuracy of state-of-the-art NLI models from 95.3%, as observed by BBNLI, to 58.6%. Second, we employ BBNLI-next to showcase the interplay between robustness and bias, and the subtlety in differentiating between the two. Third, we point out shortcomings in current bias scores used in the literature and propose bias measures that take into account pro-/anti-stereotype bias and model brittleness. We will publicly release the BBNLI-next dataset to inspire research on rapidly expanding benchmarks to keep up with model evolution, along with research on the robustness-bias interplay in bias auditing. Note: This paper contains offensive text examples.

Viaarxiv icon

Equivariant Few-Shot Learning from Pretrained Models

May 17, 2023
Sourya Basu, Pulkit Katdare, Prasanna Sattigeri, Vijil Chenthamarakshan, Katherine Driggs-Campbell, Payel Das, Lav R. Varshney

Figure 1 for Equivariant Few-Shot Learning from Pretrained Models
Figure 2 for Equivariant Few-Shot Learning from Pretrained Models
Figure 3 for Equivariant Few-Shot Learning from Pretrained Models
Figure 4 for Equivariant Few-Shot Learning from Pretrained Models

Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of \cite{basu2022equi} and \cite{kaba2022equivariance} propose group averaging (\textit{equitune}) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While \cite{kaba2022equivariance} are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose $\lambda$-\textit{equitune} that averages the features using \textit{importance weights}, $\lambda$s. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that $\lambda$-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of \cite{kaba2022equivariance} used with appropriate loss functions, which we call \textit{equizero}, also gives excellent zero-shot and finetuned performance. Both equitune and equizero are special cases of $\lambda$-equitune. To show the simplicity and generality of our method, we validate on a wide range of diverse applications and models such as 1) image classification using CLIP, 2) deep Q-learning, 3) fairness in natural language generation (NLG), 4) compositional generalization in languages, and 5) image classification using pretrained CNNs such as Resnet and Alexnet.

Viaarxiv icon

Enhancing Protein Language Models with Structure-based Encoder and Pre-training

Mar 11, 2023
Zuobai Zhang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang

Figure 1 for Enhancing Protein Language Models with Structure-based Encoder and Pre-training
Figure 2 for Enhancing Protein Language Models with Structure-based Encoder and Pre-training
Figure 3 for Enhancing Protein Language Models with Structure-based Encoder and Pre-training
Figure 4 for Enhancing Protein Language Models with Structure-based Encoder and Pre-training

Protein language models (PLMs) pre-trained on large-scale protein sequence corpora have achieved impressive performance on various downstream protein understanding tasks. Despite the ability to implicitly capture inter-residue contact information, transformer-based PLMs cannot encode protein structures explicitly for better structure-aware protein representations. Besides, the power of pre-training on available protein structures has not been explored for improving these PLMs, though structures are important to determine functions. To tackle these limitations, in this work, we enhance the PLMs with structure-based encoder and pre-training. We first explore feasible model architectures to combine the advantages of a state-of-the-art PLM (i.e., ESM-1b1) and a state-of-the-art protein structure encoder (i.e., GearNet). We empirically verify the ESM-GearNet that connects two encoders in a series way as the most effective combination model. To further improve the effectiveness of ESM-GearNet, we pre-train it on massive unlabeled protein structures with contrastive learning, which aligns representations of co-occurring subsequences so as to capture their biological correlation. Extensive experiments on EC and GO protein function prediction benchmarks demonstrate the superiority of ESM-GearNet over previous PLMs and structure encoders, and clear performance gains are further achieved by structure-based pre-training upon ESM-GearNet. Our implementation is available at https://github.com/DeepGraphLearning/GearNet.

Viaarxiv icon

Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction

Jan 28, 2023
Zuobai Zhang, Minghao Xu, Aurélie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang

Figure 1 for Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction
Figure 2 for Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction
Figure 3 for Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction
Figure 4 for Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction

Pre-training methods on proteins are recently gaining interest, leveraging either protein sequences or structures, while modeling their joint energy landscape is largely unexplored. In this work, inspired by the success of denoising diffusion models, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure multimodal diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the multimodal diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a physics-inspired method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein. SiamDiff attains this goal by maximizing the mutual information between representations of diffusion trajectories of structurally-correlated conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom- and residue-level structure-based protein understanding tasks. Experimental results show that the performance of DiffPreT is consistently competitive on all tasks, and SiamDiff achieves new state-of-the-art performance, considering the mean ranks on all tasks. The source code will be released upon acceptance.

Viaarxiv icon

AI Maintenance: A Robustness Perspective

Jan 08, 2023
Pin-Yu Chen, Payel Das

Figure 1 for AI Maintenance: A Robustness Perspective
Figure 2 for AI Maintenance: A Robustness Perspective
Figure 3 for AI Maintenance: A Robustness Perspective
Figure 4 for AI Maintenance: A Robustness Perspective

With the advancements in machine learning (ML) methods and compute resources, artificial intelligence (AI) empowered systems are becoming a prevailing technology. However, current AI technology such as deep learning is not flawless. The significantly increased model complexity and data scale incur intensified challenges when lacking trustworthiness and transparency, which could create new risks and negative impacts. In this paper, we carve out AI maintenance from the robustness perspective. We start by introducing some highlighted robustness challenges in the AI lifecycle and motivating AI maintenance by making analogies to car maintenance. We then propose an AI model inspection framework to detect and mitigate robustness risks. We also draw inspiration from vehicle autonomy to define the levels of AI robustness automation. Our proposal for AI maintenance facilitates robustness assessment, status tracking, risk scanning, model hardening, and regulation throughout the AI lifecycle, which is an essential milestone toward building sustainable and trustworthy AI ecosystems.

* Accepted to IEEE Computer Magazine. To be published in 2023 
Viaarxiv icon

Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Jan 05, 2023
Ria Vinod, Pin-Yu Chen, Payel Das

Figure 1 for Reprogramming Pretrained Language Models for Protein Sequence Representation Learning
Figure 2 for Reprogramming Pretrained Language Models for Protein Sequence Representation Learning
Figure 3 for Reprogramming Pretrained Language Models for Protein Sequence Representation Learning
Figure 4 for Reprogramming Pretrained Language Models for Protein Sequence Representation Learning

Machine Learning-guided solutions for protein learning tasks have made significant headway in recent years. However, success in scientific discovery tasks is limited by the accessibility of well-defined and labeled in-domain data. To tackle the low-data constraint, recent adaptions of deep learning models pretrained on millions of protein sequences have shown promise; however, the construction of such domain-specific large-scale model is computationally expensive. Here, we propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework in which we reprogram deep models for alternate-domain tasks that can perform well on protein property prediction with significantly fewer training samples. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences, by learning a sparse linear mapping between English and protein sequence vocabulary embeddings. Our model can attain better accuracy and significantly improve the data efficiency by up to $10^5$ times over the baselines set by pretrained and standard supervised methods. To this end, we reprogram an off-the-shelf pre-trained English language transformer and benchmark it on a set of protein physicochemical prediction tasks (secondary structure, stability, homology, stability) as well as on a biomedically relevant set of protein function prediction tasks (antimicrobial, toxicity, antibody affinity).

* 11 pages, 5 figures 
Viaarxiv icon

Knowledge Graph Generation From Text

Nov 18, 2022
Igor Melnyk, Pierre Dognin, Payel Das

Figure 1 for Knowledge Graph Generation From Text
Figure 2 for Knowledge Graph Generation From Text
Figure 3 for Knowledge Graph Generation From Text
Figure 4 for Knowledge Graph Generation From Text

In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the text. For each stage we consider several architectural choices that can be used depending on the available training resources. We evaluated the model on a recent WebNLG 2020 Challenge dataset, matching the state-of-the-art performance on text-to-RDF generation task, as well as on New York Times (NYT) and a large-scale TekGen datasets, showing strong overall performance, outperforming the existing baselines. We believe that the proposed system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches. Our code can be found at https://github.com/IBM/Grapher

* Findings of EMNLP 2022 
Viaarxiv icon

Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators

Nov 08, 2022
Jenna A. Bilbrey, Kristina M. Herman, Henry Sprueill, Soritis S. Xantheas, Payel Das, Manuel Lopez Roldan, Mike Kraus, Hatem Helal, Sutanay Choudhury

Figure 1 for Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators
Figure 2 for Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators
Figure 3 for Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators
Figure 4 for Reducing Down(stream)time: Pretraining Molecular GNNs using Heterogeneous AI Accelerators

The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task. While such approaches have become the norm in fields such as natural language processing, implementation and evaluation of transfer learning approaches for chemistry are in the early stages. In this work, we demonstrate finetuning for downstream tasks on a graph neural network (GNN) trained over a molecular database containing 2.7 million water clusters. The use of Graphcore IPUs as an AI accelerator for training molecular GNNs reduces training time from a reported 2.7 days on 0.5M clusters to 1.2 hours on 2.7M clusters. Finetuning the pretrained model for downstream tasks of molecular dynamics and transfer to a different potential energy surface took only 8.3 hours and 28 minutes, respectively, on a single GPU.

* Machine Learning and the Physical Sciences Workshop at the 36th conference on Neural Information Processing Systems (NeurIPS) 
Viaarxiv icon