Abstract:Reliable anomaly detection in distributed power plant monitoring systems is essential for ensuring operational continuity and reducing maintenance costs, particularly in regions where telecom operators heavily rely on diesel generators. However, this task is challenged by extreme class imbalance, lack of interpretability, and potential fairness issues across regional clusters. In this work, we propose a supervised ML framework that integrates ensemble methods (LightGBM, XGBoost, Random Forest, CatBoost, GBDT, AdaBoost) and baseline models (Support Vector Machine, K-Nearrest Neighbors, Multilayer Perceptrons, and Logistic Regression) with advanced resampling techniques (SMOTE with Tomek Links and ENN) to address imbalance in a dataset of diesel generator operations in Cameroon. Interpretability is achieved through SHAP (SHapley Additive exPlanations), while fairness is quantified using the Disparate Impact Ratio (DIR) across operational clusters. We further evaluate model generalization using Maximum Mean Discrepancy (MMD) to capture domain shifts between regions. Experimental results show that ensemble models consistently outperform baselines, with LightGBM achieving an F1-score of 0.99 and minimal bias across clusters (DIR $\approx 0.95$). SHAP analysis highlights fuel consumption rate and runtime per day as dominant predictors, providing actionable insights for operators. Our findings demonstrate that it is possible to balance performance, interpretability, and fairness in anomaly detection, paving the way for more equitable and explainable AI systems in industrial power management. {\color{black} Finally, beyond offline evaluation, we also discuss how the trained models can be deployed in practice for real-time monitoring. We show how containerized services can process in real-time, deliver low-latency predictions, and provide interpretable outputs for operators.
Abstract:One of the critical factors that drive the economic development of a country and guarantee the sustainability of its industries is the constant availability of electricity. This is usually provided by the national electric grid. However, in developing countries where companies are emerging on a constant basis including telecommunication industries, those are still experiencing a non-stable electricity supply. Therefore, they have to rely on generators to guarantee their full functionality. Those generators depend on fuel to function and the rate of consumption gets usually high, if not monitored properly. Monitoring operation is usually carried out by a (non-expert) human. In some cases, this could be a tedious process, as some companies have reported an exaggerated high consumption rate. This work proposes a label assisted autoencoder for anomaly detection in the fuel consumed by power generating plants. In addition to the autoencoder model, we added a labelling assistance module that checks if an observation is labelled, the label is used to check the veracity of the corresponding anomaly classification given a threshold. A consensus is then reached on whether training should stop or whether the threshold should be updated or the training should continue with the search for hyper-parameters. Results show that the proposed model is highly efficient for reading anomalies with a detection accuracy of $97.20\%$ which outperforms the existing model of $96.1\%$ accuracy trained on the same dataset. In addition, the proposed model is able to classify the anomalies according to their degree of severity.