Abstract:Alzheimer's disease and related dementias (ADRD) affect one in five adults over 60, yet more than half of individuals with cognitive decline remain undiagnosed. Speech-based assessments show promise for early detection, as phonetic motor planning deficits alter acoustic features (e.g., pitch, tone), while memory and language impairments lead to syntactic and semantic errors. However, conventional speech-processing pipelines with hand-crafted features or general-purpose audio classifiers often exhibit limited performance and generalizability. To address these limitations, we introduce SpeechCARE, a multimodal speech processing pipeline that leverages pretrained, multilingual acoustic and linguistic transformer models to capture subtle speech-related cues associated with cognitive impairment. Inspired by the Mixture of Experts (MoE) paradigm, SpeechCARE employs a dynamic fusion architecture that weights transformer-based acoustic, linguistic, and demographic inputs, allowing integration of additional modalities (e.g., social factors, imaging) and enhancing robustness across diverse tasks. Its robust preprocessing includes automatic transcription, large language model (LLM)-based anomaly detection, and task identification. A SHAP-based explainability module and LLM reasoning highlight each modality's contribution to decision-making. SpeechCARE achieved AUC = 0.88 and F1 = 0.72 for classifying cognitively healthy, MCI, and AD individuals, with AUC = 0.90 and F1 = 0.62 for MCI detection. Bias analysis showed minimal disparities, except for adults over 80. Mitigation techniques included oversampling and weighted loss. Future work includes deployment in real-world care settings (e.g., VNS Health, Columbia ADRC) and EHR-integrated explainability for underrepresented populations in New York City.
Abstract:One of the growing trends in machine learning is the use of data generation techniques, since the performance of machine learning models is dependent on the quantity of the training dataset. However, in many medical applications, collecting large datasets is challenging due to resource constraints, which leads to overfitting and poor generalization. This paper introduces a novel method, Artificial Data Point Generation in Clustered Latent Space (AGCL), designed to enhance classification performance on small medical datasets through synthetic data generation. The AGCL framework involves feature extraction, K-means clustering, cluster evaluation based on a class separation metric, and the generation of synthetic data points from clusters with distinct class representations. This method was applied to Parkinson's disease screening, utilizing facial expression data, and evaluated across multiple machine learning classifiers. Experimental results demonstrate that AGCL significantly improves classification accuracy compared to baseline, GN and kNNMTD. AGCL achieved the highest overall test accuracy of 83.33% and cross-validation accuracy of 90.90% in majority voting over different emotions, confirming its effectiveness in augmenting small datasets.