Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Frédéric A. Dreyer

Disentangling multispecific antibody function with graph neural networks

Jan 30, 2026

Joshua Southern, Changpeng Lu, Santrupti Nerli, Samuel D. Stanton, Andrew M. Watkins, Franziska Seeger, Frédéric A. Dreyer

Abstract:Multispecific antibodies offer transformative therapeutic potential by engaging multiple epitopes simultaneously, yet their efficacy is an emergent property governed by complex molecular architectures. Rational design is often bottlenecked by the inability to predict how subtle changes in domain topology influence functional outcomes, a challenge exacerbated by the scarcity of comprehensive experimental data. Here, we introduce a computational framework to address part of this gap. First, we present a generative method for creating large-scale, realistic synthetic functional landscapes that capture non-linear interactions where biological activity depends on domain connectivity. Second, we propose a graph neural network architecture that explicitly encodes these topological constraints, distinguishing between format configurations that appear identical to sequence-only models. We demonstrate that this model, trained on synthetic landscapes, recapitulates complex functional properties and, via transfer learning, has the potential to achieve high predictive accuracy on limited biological datasets. We showcase the model's utility by optimizing trade-offs between efficacy and toxicity in trispecific T-cell engagers and retrieving optimal common light chains. This work provides a robust benchmarking environment for disentangling the combinatorial complexity of multispecifics, accelerating the design of next-generation therapeutics.

* 16 pages, 5 figures, code available at https://github.com/prescient-design/synapse

Via

Access Paper or Ask Questions

Tokenizing Loops of Antibodies

Sep 10, 2025

Ada Fang, Robert G. Alberstein, Simon Kelow, Frédéric A. Dreyer

Figure 1 for Tokenizing Loops of Antibodies

Figure 2 for Tokenizing Loops of Antibodies

Figure 3 for Tokenizing Loops of Antibodies

Figure 4 for Tokenizing Loops of Antibodies

Abstract:The complementarity-determining regions of antibodies are loop structures that are key to their interactions with antigens, and of high importance to the design of novel biologics. Since the 1980s, categorizing the diversity of CDR structures into canonical clusters has enabled the identification of key structural motifs of antibodies. However, existing approaches have limited coverage and cannot be readily incorporated into protein foundation models. Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibody loop tokenizer that encodes backbone dihedral angles and sequence. Igloo is trained using a contrastive learning objective to map loops with similar backbone dihedral angles closer together in latent space. Igloo can efficiently retrieve the closest matching loop structures from a structural antibody database, outperforming existing methods on identifying similar H3 loops by 5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issue of canonical clusters, while retaining the ability to recover canonical loop conformations. To demonstrate the versatility of Igloo tokens, we show that they can be incorporated into protein language models with IglooLM and IglooALM. On predicting binding affinity of heavy chain variants, IglooLM outperforms the base protein language model on 8 out of 10 antibody-antigen targets. Additionally, it is on par with existing state-of-the-art sequence-based and multimodal protein language models, performing comparably to models with $7\times$ more parameters. IglooALM samples antibody loops which are diverse in sequence and more consistent in structure than state-of-the-art antibody inverse folding models. Igloo demonstrates the benefit of introducing multimodal tokens for antibody loops for encoding the diverse landscape of antibody loops, improving protein foundation models, and for antibody CDR design.

* 21 pages, 7 figures, 10 tables, code available at https://github.com/prescient-design/igloo

Via

Access Paper or Ask Questions

Assessing interaction recovery of predicted protein-ligand poses

Sep 30, 2024

David Errington, Constantin Schneider, Cédric Bouysset, Frédéric A. Dreyer

Abstract:The field of protein-ligand pose prediction has seen significant advances in recent years, with machine learning-based methods now being commonly used in lieu of classical docking methods or even to predict all-atom protein-ligand complex structures. Most contemporary studies focus on the accuracy and physical plausibility of ligand placement to determine pose quality, often neglecting a direct assessment of the interactions observed with the protein. In this work, we demonstrate that ignoring protein-ligand interaction fingerprints can lead to overestimation of model performance, most notably in recent protein-ligand cofolding models which often fail to recapitulate key interactions.

* 12 pages, 6 figures, 1 table, code at https://github.com/Exscientia/plif_validity, data at https://doi.org/10.5281/zenodo.13843798

Via

Access Paper or Ask Questions

ABodyBuilder3: Improved and scalable antibody structure predictions

May 31, 2024

Henry Kenlay, Frédéric A. Dreyer, Daniel Cutting, Daniel Nissley, Charlotte M. Deane

Figure 1 for ABodyBuilder3: Improved and scalable antibody structure predictions

Figure 2 for ABodyBuilder3: Improved and scalable antibody structure predictions

Figure 3 for ABodyBuilder3: Improved and scalable antibody structure predictions

Figure 4 for ABodyBuilder3: Improved and scalable antibody structure predictions

Abstract:Accurate prediction of antibody structure is a central task in the design and development of monoclonal antibodies, notably to understand both their developability and their binding properties. In this article, we introduce ABodyBuilder3, an improved and scalable antibody structure prediction model based on ImmuneBuilder. We achieve a new state-of-the-art accuracy in the modelling of CDR loops by leveraging language model embeddings, and show how predicted structures can be further improved through careful relaxation strategies. Finally, we incorporate a predicted Local Distance Difference Test into the model output to allow for a more accurate estimation of uncertainties.

* 8 pages, 3 figures, 3 tables, code available at https://github.com/Exscientia/ABodyBuilder3, weights and data available at https://zenodo.org/records/11354577

Via

Access Paper or Ask Questions

De novo antibody design with SE diffusion

May 13, 2024

Daniel Cutting, Frédéric A. Dreyer, David Errington, Constantin Schneider, Charlotte M. Deane

Figure 1 for De novo antibody design with SE diffusion

Figure 2 for De novo antibody design with SE diffusion

Figure 3 for De novo antibody design with SE diffusion

Figure 4 for De novo antibody design with SE diffusion

Abstract:We introduce IgDiff, an antibody variable domain diffusion model based on a general protein backbone diffusion framework which was extended to handle multiple chains. Assessing the designability and novelty of the structures generated with our model, we find that IgDiff produces highly designable antibodies that can contain novel binding regions. The backbone dihedral angles of sampled structures show good agreement with a reference antibody distribution. We verify these designed antibodies experimentally and find that all express with high yield. Finally, we compare our model with a state-of-the-art generative backbone diffusion model on a range of antibody design tasks, such as the design of the complementarity determining regions or the pairing of a light chain to an existing heavy chain, and show improved properties and designability.

* 20 pages, 11 figures, 4 tables, model weights and samples available at https://zenodo.org/records/11184374

Via

Access Paper or Ask Questions

Large scale paired antibody language models

Mar 26, 2024

Henry Kenlay, Frédéric A. Dreyer, Aleksandr Kovaltsuk, Dom Miketa, Douglas Pires, Charlotte M. Deane

Figure 1 for Large scale paired antibody language models

Figure 2 for Large scale paired antibody language models

Figure 3 for Large scale paired antibody language models

Figure 4 for Large scale paired antibody language models

Abstract:Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity and affinity, and constitute the most successful class of biotherapeutics. With the advent of next-generation sequencing, billions of antibody sequences have been collected in recent years, though their application in the design of better therapeutics has been constrained by the sheer volume and complexity of the data. To address this challenge, we present IgBert and IgT5, the best performing antibody-specific language models developed to date which can consistently handle both paired and unpaired variable region sequences as input. These models are trained comprehensively using the more than two billion unpaired sequences and two million paired sequences of light and heavy chains present in the Observed Antibody Space dataset. We show that our models outperform existing antibody and protein language models on a diverse range of design and regression tasks relevant to antibody engineering. This advancement marks a significant leap forward in leveraging machine learning, large scale data sets and high-performance computing for enhancing antibody design for therapeutic development.

* 14 pages, 2 figures, 6 tables, model weights available at https://zenodo.org/doi/10.5281/zenodo.10876908

Via

Access Paper or Ask Questions

Inverse folding for antibody sequence design using deep learning

Oct 30, 2023

Frédéric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane

Abstract:We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.

* 2023 ICML Workshop on Computational Biology, model weights available at https://zenodo.org/record/8164693

Via

Access Paper or Ask Questions

Multilingual End to End Entity Linking

Jun 15, 2023

Mikhail Plekhanov, Nora Kassner, Kashyap Popat, Louis Martin, Simone Merello, Borislav Kozlovskii, Frédéric A. Dreyer, Nicola Cancedda

Figure 1 for Multilingual End to End Entity Linking

Figure 2 for Multilingual End to End Entity Linking

Figure 3 for Multilingual End to End Entity Linking

Figure 4 for Multilingual End to End Entity Linking

Abstract:Entity Linking is one of the most common Natural Language Processing tasks in practical applications, but so far efficient end-to-end solutions with multilingual coverage have been lacking, leading to complex model stacks. To fill this gap, we release and open source BELA, the first fully end-to-end multilingual entity linking model that efficiently detects and links entities in texts in any of 97 languages. We provide here a detailed description of the model and report BELA's performance on four entity linking datasets covering high- and low-resource languages.

Via

Access Paper or Ask Questions

Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

May 19, 2023

Mattia Atzeni, Mikhail Plekhanov, Frédéric A. Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

Figure 1 for Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Figure 2 for Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Figure 3 for Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Figure 4 for Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Abstract:Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prior knowledge of entity types. Inspired by duck typing in programming languages, we propose to define the type of an entity based on the relations that it has with other entities in a knowledge graph. Then, porting the concept of box embeddings to spherical polar coordinates, we propose to represent relations as boxes on the hypersphere. We optimize the model to cluster entities of similar type by placing them inside the boxes corresponding to their relations. Our experiments show that our method sets new state-of-the-art results on standard entity-disambiguation benchmarks, it improves the performance of the model by up to 7.9 F1 points, outperforms other type-aware approaches, and matches the results of generative models with 18 times more parameters.

Via

Access Paper or Ask Questions

Leveraging universality of jet taggers through transfer learning

Mar 11, 2022

Frédéric A. Dreyer, Radosław Grabarczyk, Pier Francesco Monni

Figure 1 for Leveraging universality of jet taggers through transfer learning

Figure 2 for Leveraging universality of jet taggers through transfer learning

Figure 3 for Leveraging universality of jet taggers through transfer learning

Figure 4 for Leveraging universality of jet taggers through transfer learning

Abstract:A significant challenge in the tagging of boosted objects via machine-learning technology is the prohibitive computational cost associated with training sophisticated models. Nevertheless, the universality of QCD suggests that a large amount of the information learnt in the training is common to different physical signals and experimental setups. In this article, we explore the use of transfer learning techniques to develop fast and data-efficient jet taggers that leverage such universality. We consider the graph neural networks LundNet and ParticleNet, and introduce two prescriptions to transfer an existing tagger into a new signal based either on fine-tuning all the weights of a model or alternatively on freezing a fraction of them. In the case of $W$-boson and top-quark tagging, we find that one can obtain reliable taggers using an order of magnitude less data with a corresponding speed-up of the training process. Moreover, while keeping the size of the training data set fixed, we observe a speed-up of the training by up to a factor of three. This offers a promising avenue to facilitate the use of such tools in collider physics experiments.

* 10 pages, 2 tables, 5 figures

Via

Access Paper or Ask Questions