Abstract:Foundation models for echocardiography promise to reduce annotation burden and improve diagnostic consistency by learning generalizable representations from large unlabeled video archives. However, current approaches fail to disentangle anatomical signal from the stochastic speckle and acquisition artifacts that dominate ultrasound imagery. We present EchoJEPA, a foundation model for echocardiography trained on 18 million echocardiograms across 300K patients, the largest pretraining corpus for this modality to date. We also introduce a novel multi-view probing framework with factorized stream embeddings that standardizes evaluation under frozen backbones. Compared to prior methods, EchoJEPA reduces left ventricular ejection fraction estimation error by 19% and achieves 87.4% view classification accuracy. EchoJEPA exhibits strong sample efficiency, reaching 78.6% accuracy with only 1% of labeled data versus 42.1% for the best baseline trained on 100%. Under acoustic perturbations, EchoJEPA degrades by only 2.3% compared to 16.8% for the next best model, and transfers zero-shot to pediatric patients with 15% lower error than the next best model, outperforming all fine-tuned baselines. These results establish latent prediction as a superior paradigm for ultrasound foundation models.
Abstract:Arrhythmogenic right ventricular cardiomyopathy (ARVC) and long QT syndrome (LQTS) are inherited arrhythmia syndromes associated with sudden cardiac death. Deep learning shows promise for ECG interpretation, but multi-class inherited arrhythmia classification with clinically grounded interpretability remains underdeveloped. Our objective was to develop and validate a lead-aware deep learning framework for multi-class (ARVC vs LQTS vs control) and binary inherited arrhythmia classification, and to determine optimal strategies for integrating ECG foundation models within arrhythmia screening tools. We assembled a 13-center Canadian cohort (645 patients; 1,344 ECGs). We evaluated four ECG foundation models using three transfer learning approaches: linear probing, fine-tuning, and combined strategies. We developed lead-aware spatial attention networks (LASAN) and assessed integration strategies combining LASAN with foundation models. Performance was compared against the established foundation model baselines. Lead-group masking quantified disease-specific lead dependence. Fine-tuning outperformed linear probing and combined strategies across all foundation models (mean macro-AUROC 0.904 vs 0.825). The best lead-aware integrations achieved near-ceiling performance (HuBERT-ECG hybrid: macro-AUROC 0.990; ARVC vs control AUROC 0.999; LQTS vs control AUROC 0.994). Lead masking demonstrated physiologic plausibility: V1-V3 were most critical for ARVC detection (4.54% AUROC reduction), while lateral leads were preferentially important for LQTS (2.60% drop). Lead-aware architectures achieved state-of-the-art performance for inherited arrhythmia classification, outperforming all existing published models on both binary and multi-class tasks while demonstrating clinically aligned lead dependence. These findings support potential utility for automated ECG screening pending validation.