Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Jun 24, 2025

Alan N. Amin, Andres Potapczynski, Andrew Gordon Wilson

Share this with someone who'll enjoy it:

Abstract:To understand how genetic variants in human genomes manifest in phenotypes -- traits like height or diseases like asthma -- geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.

* For example: ICML 2025. Code available at: https://github.com/AlanNawzadAmin/DeepWAS

View paper on

Share this with someone who'll enjoy it:

Title:Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Paper and Code