Abstract:Thrombosis in rotary blood pumps arises from complex flow conditions that remain difficult to translate into reliable and interpretable risk predictions using existing computational models. This limitation reflects an incomplete understanding of how specific flow features contribute to thrombus initiation and growth. This study introduces an interpretable machine learning framework for spatial thrombosis assessment based directly on computational fluid dynamics-derived flow features. A logistic regression (LR) model combined with a structured feature-selection pipeline is used to derive a compact and physically interpretable feature set, including nonlinear feature combinations. The framework is trained using spatial risk patterns from a validated, macro-scale thrombosis model for two representative scenarios. The model reproduces the labeled risk distributions and identifies distinct sets of flow features associated with increased thrombosis risk. When applied to a centrifugal pump, despite training on a single axial pump operating point, the model predicts plausible thrombosis-prone regions. These results show that interpretable machine learning can link local flow features to thrombosis risk while remaining computationally efficient and mechanistically transparent. The low computational cost enables rapid thrombogenicity screening without repeated or costly simulations. The proposed framework complements physics-based thrombosis modeling and provides a methodological basis for integrating interpretable machine learning into CFD-driven thrombosis analysis and device design workflows.




Abstract:Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues. We show that our model, dubbed IsoFormer, is able to accurately predict differential transcript expression, outperforming existing methods and leveraging the use of multiple modalities. Our framework also achieves efficient transfer knowledge from the encoders pre-training as well as in between modalities. We open-source our model, paving the way for new multi-modal gene expression approaches.