Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gevorg Grigoryan

Flexible Kernels for Protein Property Prediction

Jun 09, 2026

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

Abstract:Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

* 50 pages; to appear at ICML 2026

Via

Access Paper or Ask Questions

TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs

Apr 27, 2022

Alex J. Li, Vikram Sundar, Gevorg Grigoryan, Amy E. Keating

Figure 1 for TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs

Figure 2 for TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs

Figure 3 for TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs

Figure 4 for TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs

Abstract:Computational protein design has the potential to deliver novel molecular structures, binders, and catalysts for myriad applications. Recent neural graph-based models that use backbone coordinate-derived features show exceptional performance on native sequence recovery tasks and are promising frameworks for design. A statistical framework for modeling protein sequence landscapes using Tertiary Motifs (TERMs), compact units of recurring structure in proteins, has also demonstrated good performance on protein design tasks. In this work, we investigate the use of TERM-derived data as features in neural protein design frameworks. Our graph-based architecture, TERMinator, incorporates TERM-based and coordinate-based information and outputs a Potts model over sequence space. TERMinator outperforms state-of-the-art models on native sequence recovery tasks, suggesting that utilizing TERM-based and coordinate-based features together is beneficial for protein design.

* Machine Learning for Structural Biology, NeurIPS 2021

Via

Access Paper or Ask Questions