Abstract:Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: outputs can appear compliant without being compliant. Existing evaluation measures task accuracy but not whether governance constrains behaviour at the decision rationale level -- where regulated decisions must be auditable. We introduce five governance metrics that quantify policy compliance at the rationale level and apply them in a synthetic banking domain to compare text-only governance against mechanical enforcement: four primitives operating outside the model's interpretive loop. Under text-only governance, 27% of deferrals carry no decision-relevant information. Mechanical enforcement reduces this rate by 73%, more than doubles deferral information content, and raises task accuracy from MCC~$0.43$ to $0.88$. The improvement is driven by architectural separation: LLM-generated rationales under mechanical enforcement show comparable CDL to text-only governance -- the gain comes from removing clear-cut decisions from the model's control. A causal ablation confirms that each primitive is individually necessary. Our central finding is a governance-task decoupling: under structural stress, text-only governance degrades on both dimensions simultaneously, whereas mechanical enforcement preserves governance quality even as task performance drops. This implies that governance and task evaluation are distinct axes: accuracy is not a sufficient proxy for governance in regulated AI systems.




Abstract:The opaque nature of transformer-based models, particularly in applications susceptible to unethical practices such as dark-patterns in user interfaces, requires models that integrate uncertainty quantification to enhance trust in predictions. This study focuses on dark-pattern detection, deceptive design choices that manipulate user decisions, undermining autonomy and consent. We propose a differential fine-tuning approach implemented at the final classification head via uncertainty quantification with transformer-based pre-trained models. Employing a dense neural network (DNN) head architecture as a baseline, we examine two methods capable of quantifying uncertainty: Spectral-normalized Neural Gaussian Processes (SNGPs) and Bayesian Neural Networks (BNNs). These methods are evaluated on a set of open-source foundational models across multiple dimensions: model performance, variance in certainty of predictions and environmental impact during training and inference phases. Results demonstrate that integrating uncertainty quantification maintains performance while providing insights into challenging instances within the models. Moreover, the study reveals that the environmental impact does not uniformly increase with the incorporation of uncertainty quantification techniques. The study's findings demonstrate that uncertainty quantification enhances transparency and provides measurable confidence in predictions, improving the explainability and clarity of black-box models. This facilitates informed decision-making and mitigates the influence of dark-patterns on user interfaces. These results highlight the importance of incorporating uncertainty quantification techniques in developing machine learning models, particularly in domains where interpretability and trustworthiness are critical.




Abstract:In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.