As machine learning becomes increasingly prevalent in critical fields such as healthcare, ensuring the safety and reliability of machine learning systems becomes paramount. A key component of reliability is the ability to estimate uncertainty, which enables the identification of areas of high and low confidence and helps to minimize the risk of error. In this study, we propose a machine learning pipeline called U-PASS tailored for clinical applications that incorporates uncertainty estimation at every stage of the process, including data acquisition, training, and model deployment. The training process is divided into a supervised pre-training step and a semi-supervised finetuning step. We apply our uncertainty-guided deep learning pipeline to the challenging problem of sleep staging and demonstrate that it systematically improves performance at every stage. By optimizing the training dataset, actively seeking informative samples, and deferring the most uncertain samples to an expert, we achieve an expert-level accuracy of 85% on a challenging clinical dataset of elderly sleep apnea patients, representing a significant improvement over the baseline accuracy of 75%. U-PASS represents a promising approach to incorporating uncertainty estimation into machine learning pipelines, thereby improving their reliability and unlocking their potential in clinical settings.
Objective: With the rapid rise of wearable sleep monitoring devices with non-conventional electrode configurations, there is a need for automated algorithms that can perform sleep staging on configurations with small amounts of labeled data. Transfer learning has the ability to adapt neural network weights from a source modality (e.g. standard electrode configuration) to a new target modality (e.g. non-conventional electrode configuration). Methods: We propose feature matching, a new transfer learning strategy as an alternative to the commonly used finetuning approach. This method consists of training a model with larger amounts of data from the source modality and few paired samples of source and target modality. For those paired samples, the model extracts features of the target modality, matching these to the features from the corresponding samples of the source modality. Results: We compare feature matching to finetuning for three different target domains, with two different neural network architectures, and with varying amounts of training data. Particularly on small cohorts (i.e. 2 - 5 labeled recordings in the non-conventional recording setting), feature matching systematically outperforms finetuning with mean relative differences in accuracy ranging from 0.4% to 4.7% for the different scenarios and datasets. Conclusion: Our findings suggest that feature matching outperforms finetuning as a transfer learning approach, especially in very low data regimes. Significance: As such, we conclude that feature matching is a promising new method for wearable sleep staging with novel devices.