Abstract:Code review is a key practice in software engineering, ensuring quality and collaboration. However, industrial Merge Request (MR) workflows often deviate from standardized review processes, with many MRs serving non-review purposes (e.g., drafts, rebases, or dependency updates). We term these cases deviations and hypothesize that ignoring them biases analytics and undermines ML models for review analysis. We identify seven deviation categories, occurring in 37.02% of MRs, and propose a few-shot learning detection method (91% accuracy). By excluding deviations, ML models predicting review completion time improve performance in 53.33% of cases (up to 2.25x) and exhibit significant shifts in feature importance (47% overall, 60% top-*k*). Our contributions include: (1) a taxonomy of MR deviations, (2) an AI-driven detection approach, and (3) empirical evidence of their impact on ML-based review analytics. This work aids practitioners in optimizing review efforts and ensuring reliable insights.
Abstract:Despite the indisputable benefits of Continuous Integration (CI) pipelines (or builds), CI still presents significant challenges regarding long durations, failures, and flakiness. Prior studies addressed CI challenges in isolation, yet these issues are interrelated and require a holistic approach for effective optimization. To bridge this gap, this paper proposes a novel idea of developing Digital Twins (DTs) of build processes to enable global and continuous improvement. To support such an idea, we introduce the CI Build process Digital Twin (CBDT) framework as a minimum viable product. This framework offers digital shadowing functionalities, including real-time build data acquisition and continuous monitoring of build process performance metrics. Furthermore, we discuss guidelines and challenges in the practical implementation of CBDTs, including (1) modeling different aspects of the build process using Machine Learning, (2) exploring what-if scenarios based on historical patterns, and (3) implementing prescriptive services such as automated failure and performance repair to continuously improve build processes.