Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ricardo C. Berrios

An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

Apr 27, 2026

Ioannis Bilionis, Ricardo C. Berrios, Luis Fernandez-Luque, Carlos Castillo

Abstract:Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks. We evaluated the proposed monitoring framework on four publicly available U.S.-based Type 1 Diabetes datasets containing high-resolution continuous glucose monitoring (CGM) data, comprising approximately 11,300 weekly observations from 496 participants under 20 years of age. All datasets included structured sociodemographic information. Using the prediction of severe hyperglycemia events in children with type 1 diabetes as a case study, we examine how different model update strategies can adversely affect model stability (e.g., by causing predictions to "flip" for a large number of cases after an update), increase arbitrariness in predictions, or worsen accuracy equity and the balance of error rates across subpopulations. We propose multiple dimensions for continuous monitoring to detect these issues and argue that such monitoring is essential for the development of trustworthy clinical decision support systems.

* Accepted to iEEE EMBC 2026. 4 pages, 3 figures

Via

Access Paper or Ask Questions

Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Dec 27, 2024

Ioannis Bilionis, Ricardo C. Berrios, Luis Fernandez-Luque, Carlos Castillo

Figure 1 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 2 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 3 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 4 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Abstract:Machine Learning (ML) algorithms are vital for supporting clinical decision-making in biomedical informatics. However, their predictive performance can vary across demographic groups, often due to the underrepresentation of historically marginalized populations in training datasets. The investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived ML models. Thus, a novel analytical framework is introduced, combining systematic arbitrariness with traditional metrics like accuracy and data complexity. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance. This highlights that representativeness in training data alone does not guarantee equitable outcomes, and model arbitrariness must be addressed before deploying models in clinical settings.

* This paper will be presented in American Medical Informatics Association (AMIA) Informatics Summit Conference 2025 (Pittsburgh, PA). 10 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions