Abstract:With spotforecast2-safe we present an integrated Compliance-by-Design approach to Python-based point forecasting of time series in safety-critical environments. A review of the relevant open-source tooling shows that existing compliance solutions operate consistently outside of the library to be used - e.g. as scanners, templates, or runtime layers. spotforecast2-safe takes the inverse approach and anchors the requirements of Regulation (EU) 2024/1689 (the EU AI Act, in German: KI-VO), of IEC 61508, of the ISA/IEC 62443 standards series, and of the Cyber Resilience Act within the library: in application-programming-interface contracts, persistence formats, and continuous-integration gates. The approach is operationalised by four non-negotiable code-development rules (zero dead code, deterministic processing, fail-safe handling, minimal dependencies) together with the corresponding process rules (model card, executable docstrings, CI workflows, Common-Platform-Enumeration (CPE) identifier, REUSE-conformant licensing, release pipeline). Interactive visualisation, hyperparameter tuning and automated machine learning (AutoML), as well as deep-learning and large-language-model backends are deliberately excluded, because each of these components either enlarges the attack surface, introduces non-determinism, or impairs reproducibility. A bidirectional traceability matrix maps every regulatory provision onto the corresponding mechanism in the code; an end-to-end example of European-market electricity generation, transmission, and consumption forecasting demonstrates the application. The package is open-source and available under Affero General Public License (AGPL) 3.0-or-later.
Abstract:Large Language Models (LLMs) have achieved impressive results on public benchmarks, often leading to claims of advanced reasoning and understanding. However, recent research in cognitive science reveals that these models sometimes rely on shallow heuristics and memorization, taking shortcuts rather than demonstrating genuine cognitive abilities. This paper investigates LLM behavior in automated test generation for software, contrasting performance on an open-source system (LevelDB) with SAP HANA, one of the most widely deployed commercial database systems worldwide, whose proprietary codebase is guaranteed to be absent from training data. We combine cognitive evaluation principles, drawing on Mitchell's mechanism-focused assessment methodology, with empirical software testing, employing mutation score and iterative compiler-feedback repair loops to assess both accuracy and underlying reasoning strategies. Results show that LLMs excel on familiar, open-source benchmarks but struggle with unseen, complex domains, often prioritizing compilability over semantic effectiveness. These findings provide independent software engineering evidence for the broader claim that current LLMs lack robust reasoning, and highlight the need for evaluation frameworks that penalize trivial shortcuts and reward true generalization.
Abstract:The `spotoptim` package implements surrogate-model-based optimization of expensive black-box functions in Python. Building on two decades of Sequential Parameter Optimization (SPO) methodology, it provides a Kriging-based optimization loop with Expected Improvement, support for continuous, integer, and categorical variables, noise-aware evaluation via Optimal Computing Budget Allocation (OCBA), and multi-objective extensions. A steady-state parallelization strategy overlaps surrogate search with objective evaluation on multi-core hardware, and a success-rate-based restart mechanism detects stagnation while preserving the best solution found. The package returns scipy-compatible `OptimizeResult` objects and accepts any scikit-learn-compatible surrogate model. Built-in TensorBoard logging provides real-time monitoring of convergence and surrogate quality. This report describes the architecture and module structure of spotoptim, provides worked examples including neural network hyperparameter tuning, and compares the framework with BoTorch, Optuna, Ray Tune, BOHB, SMAC, and Hyperopt. The package is open-source.
Abstract:Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure, constraints, and information limitations of continuous and mixed-integer optimization problems in practice. This disconnect leads to the misuse of benchmarking suites for competitions, automated algorithm selection, and industrial decision-making, despite these suites being designed for different purposes. We identify key gaps in current benchmarking practices and tooling, including limited availability of real-world-inspired problems, missing high-level features, and challenges in multi-objective and noisy settings. We propose a vision centered on curated real-world-inspired benchmarks, practitioner-accessible feature spaces and community-maintained performance databases. Real progress requires coordinated effort: A living benchmarking ecosystem that evolves with real-world insights and supports both scientific understanding and industrial use.




Abstract:Despite the growing interest in Explainable Artificial Intelligence (XAI), explainability is rarely considered during hyperparameter tuning or neural architecture optimization, where the focus remains primarily on minimizing predictive loss. In this work, we introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods, and propose new metrics to quantify it. For the first time, we integrate XAI consistency directly into the hyperparameter tuning objective, creating a multi-objective optimization framework that balances predictive performance with explanation robustness. Implemented within the Sequential Parameter Optimization Toolbox (SPOT), our approach uses both weighted aggregation and desirability-based strategies to guide model selection. Through our proposed framework and supporting tools, we explore the impact of incorporating XAI consistency into the optimization process. This enables us to characterize distinct regions in the architecture configuration space: one region with poor performance and comparatively low interpretability, another with strong predictive performance but weak interpretability due to low \gls{xai} consistency, and a trade-off region that balances both objectives by offering high interpretability alongside competitive performance. Beyond introducing this novel approach, our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness by avoiding overfitting to training performance, thereby leading to more reliable predictions on out-of-distribution data.




Abstract:The goal of this article is to provide an introduction to the desirability function approach to multi-objective optimization (direct and surrogate model-based), and multi-objective hyperparameter tuning. This work is based on the paper by Kuhn (2016). It presents a `Python` implementation of Kuhn's `R` package `desirability`. The `Python` package `spotdesirability` is available as part of the `sequential parameter optimization` framework. After a brief introduction to the desirability function approach is presented, three examples are given that demonstrate how to use the desirability functions for classical optimization, surrogate-model based optimization, and hyperparameter tuning.




Abstract:The increasing shortage of nursing staff and the acute risk of falls in nursing homes pose significant challenges for the healthcare system. This study presents the development of an automated fall detection system integrated into care beds, aimed at enhancing patient safety without compromising privacy through wearables or video monitoring. Mechanical vibrations transmitted through the bed frame are processed using a short-time Fourier transform, enabling robust classification of distinct human fall patterns with a convolutional neural network. Challenges pertaining to the quantity and diversity of the data are addressed, proposing the generation of additional data with a specific emphasis on enhancing variation. While the model shows promising results in distinguishing fall events from noise using lab data, further testing in real-world environments is recommended for validation and improvement. Despite limited available data, the proposed system shows the potential for an accurate and rapid response to falls, mitigating health implications, and addressing the needs of an aging population. This case study was performed as part of the ZIM Project. Further research on sensors enhanced by artificial intelligence will be continued in the ShapeFuture Project.
Abstract:Research in Explainable Artificial Intelligence (XAI) is increasing, aiming to make deep learning models more transparent. Most XAI methods focus on justifying the decisions made by Artificial Intelligence (AI) systems in security-relevant applications. However, relatively little attention has been given to using these methods to improve the performance and robustness of deep learning algorithms. Additionally, much of the existing XAI work primarily addresses classification problems. In this study, we investigate the potential of feature attribution methods to filter out uninformative features in input data for regression problems, thereby improving the accuracy and stability of predictions. We introduce a feature selection pipeline that combines Integrated Gradients with k-means clustering to select an optimal set of variables from the initial data space. To validate the effectiveness of this approach, we apply it to a real-world industrial problem - blade vibration analysis in the development process of turbo machinery.




Abstract:Stochastic optimization algorithms have been successfully applied in several domains to find optimal solutions. Because of the ever-growing complexity of the integrated systems, novel stochastic algorithms are being proposed, which makes the task of the performance analysis of the algorithms extremely important. In this paper, we provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems. The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure that is based on the principles of severity. Analogous to the football league scoring scheme, we propose pairwise comparison of algorithms as in league competition. Each algorithm accumulates points and a performance metric of how good or bad it performed against other algorithms analogous to goal differences metric in football league scoring system. The goal differences performance metric can not only be used as a tie-breaker but also be used to obtain a quantitative performance of each algorithm. The key novelty of the proposed ranking scheme is that it takes into account the performance of each algorithm considering the magnitude of the achieved performance improvement along with its practical relevance and does not have any distributional assumptions. The proposed ranking scheme is compared to classical hypothesis testing and the analysis of the results shows that the results are comparable and our proposed ranking showcases many additional benefits.




Abstract:Batch Machine Learning (BML) reaches its limits when dealing with very large amounts of streaming data. This is especially true for available memory, handling drift in data streams, and processing new, unknown data. Online Machine Learning (OML) is an alternative to BML that overcomes the limitations of BML. OML is able to process data in a sequential manner, which is especially useful for data streams. The `river` package is a Python OML-library, which provides a variety of online learning algorithms for classification, regression, clustering, anomaly detection, and more. The `spotRiver` package provides a framework for hyperparameter tuning of OML models. The `spotRiverGUI` is a graphical user interface for the `spotRiver` package. The `spotRiverGUI` releases the user from the burden of manually searching for the optimal hyperparameter setting. After the data is provided, users can compare different OML algorithms from the powerful `river` package in a convenient way and tune the selected algorithms very efficiently.