Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ghadi S. Al Hajj

Scalability and Maintainability Challenges and Solutions in Machine Learning: Systematic Literature Review

Apr 15, 2025

Karthik Shivashankar, Ghadi S. Al Hajj, Antonio Martini

Abstract:This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.

* Minor Revision ACM Computing Survey

Via

Access Paper or Ask Questions

Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Apr 20, 2022

Milena Pavlović, Ghadi S. Al Hajj, Johan Pensar, Mollie Wood, Ludvig M. Sollid, Victor Greiff, Geir Kjetil Sandve

Figure 1 for Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Figure 2 for Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Figure 3 for Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Figure 4 for Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Abstract:Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges, and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). We discuss how the main biological and experimental factors of the AIRR domain may influence the learned biomarkers and provide easily adjustable simulations of such effects. In conclusion, we find that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.

Via

Access Paper or Ask Questions