Abstract:The application of Machine Learning (ML) to the diagnosis of rare diseases, such as collagen VI-related dystrophies (COL6-RD), is fundamentally limited by the scarcity and fragmentation of available data. Attempts to expand sampling across hospitals, institutions, or countries with differing regulations face severe privacy, regulatory, and logistical obstacles that are often difficult to overcome. The Federated Learning (FL) provides a promising solution by enabling collaborative model training across decentralized datasets while keeping patient data local and private. Here, we report a novel global FL initiative using the Sherpa.ai FL platform, which leverages FL across distributed datasets in two international organizations for the diagnosis of COL6-RD, using collagen VI immunofluorescence microscopy images from patient-derived fibroblast cultures. Our solution resulted in an ML model capable of classifying collagen VI patient images into the three primary pathogenic mechanism groups associated with COL6-RD: exon skipping, glycine substitution, and pseudoexon insertion. This new approach achieved an F1-score of 0.82, outperforming single-organization models (0.57-0.75). These results demonstrate that FL substantially improves diagnostic utility and generalizability compared to isolated institutional models. Beyond enabling more accurate diagnosis, we anticipate that this approach will support the interpretation of variants of uncertain significance and guide the prioritization of sequencing strategies to identify novel pathogenic variants.
Abstract:Early and accurate pneumonia detection from chest X-rays (CXRs) is clinically critical to expedite treatment and isolation, reduce complications, and curb unnecessary antibiotic use. Although artificial intelligence (AI) substantially improves CXR-based detection, development is hindered by globally distributed data, high inter-hospital variability, and strict privacy regulations (e.g., HIPAA, GDPR) that make centralization impractical. These constraints are compounded by heterogeneous imaging protocols, uneven data availability, and the costs of transferring large medical images across geographically dispersed sites. In this paper, we evaluate Federated Learning (FL) using the Sherpa.ai FL platform, enabling multiple hospitals (nodes) to collaboratively train a CXR classifier for pneumonia while keeping data in place and private. Using the Pediatric Pneumonia Chest X-ray dataset, we simulate cross-hospital collaboration with non-independent and non-identically distributed (non-IID) data, reproducing real-world variability across institutions and jurisdictions. Our experiments demonstrate that collaborative and privacy-preserving training across multiple hospitals via FL led to a dramatic performance improvement achieving 0.900 Accuracy and 0.966 ROC-AUC, corresponding to 47.5% and 50.0% gains over single-hospital models (0.610; 0.644), without transferring any patient CXR. These results indicate that FL delivers high-performing, generalizable, secure and private pneumonia detection across healthcare networks, with data kept local. This is especially relevant for rare diseases, where FL enables secure multi-institutional collaboration without data movement, representing a breakthrough for accelerating diagnosis and treatment development in low-data domains.