Abstract:We developed a ResNet-based human activity recognition (HAR) model with minimal overhead to detect gait versus non-gait activities and everyday activities (walking, running, stairs, standing, sitting, lying, sit-to-stand transitions). The model was trained and evaluated using smartphone sensor data from adult healthy controls (HC) and people with multiple sclerosis (PwMS) with Expanded Disability Status Scale (EDSS) scores between 0.0-6.5. Datasets included the GaitLab study (ISRCTN15993728), an internal Roche dataset, and publicly available data sources (training only). Data from 34 HC and 68 PwMS (mean [SD] EDSS: 4.7 [1.5]) were included in the evaluation. The HAR model showed 98.4% and 99.6% accuracy in detecting gait versus non-gait activities in the GaitLab and Roche datasets, respectively, similar to a comparative state-of-the-art ResNet model (99.3% and 99.4%). For everyday activities, the proposed model not only demonstrated higher accuracy than the state-of-the-art model (96.2% vs 91.9%; internal Roche dataset) but also maintained high performance across 9 smartphone wear locations (handbag, shopping bag, crossbody bag, backpack, hoodie pocket, coat/jacket pocket, hand, neck, belt), outperforming the state-of-the-art model by 2.8% - 9.0%. In conclusion, the proposed HAR model accurately detects everyday activities and shows high robustness to various smartphone wear locations, demonstrating its practical applicability.
Abstract:Early multiple sclerosis (MS) disability progression prediction is challenging due to disease heterogeneity. This work predicts 48- and 72-week disability using sparse baseline clinical data and 12 weeks of daily digital Floodlight data from the CONSONANCE clinical trial. We employed state-of-the-art tabular and time-series foundation models (FMs), a custom multimodal attention-based transformer, and machine learning methods. Despite the difficulty of early prediction (AUROC 0.63), integrating digital data via advanced models improved performance over clinical data alone. A transformer model using unimodal embeddings from the Moment FM yielded the best result, but our multimodal transformer consistently outperformed its unimodal counterpart, confirming the advantages of combining clinical with digital data. Our findings demonstrate the promise of FMs and multimodal approaches to extract predictive signals from complex and diverse clinical and digital life sciences data (e.g., imaging, omics), enabling more accurate prognostics for MS and potentially other complex diseases.