Abstract:Engineering reliable autonomous systems is an important and growing topic in computer science. As autonomous systems become more prevalent, easy-to-use techniques for building them reliably are increasingly important. This workshop report captures and expands on the discussions at the Lorentz Center Workshop "Engineering Reliable Autonomous Systems" (ERAS), held from 10 to 14 June 2024. The workshop was co-organised by the organisers of the Workshop on Formal Methods for Autonomous Systems (FMAS) and the Workshop on Agents and Robots for reliable Engineered Autonomy (AREA). It brought together members of the FMAS and AREA communities, industry practitioners, and representatives from sectors where autonomous systems pose distinctive engineering challenges. The workshop focused on three main research topics: techniques for verification and validation of autonomous systems; engineering real-world autonomous systems; and software architectures for safe autonomous systems. Its main outcome is a catalogue of challenges in these areas and, most importantly, a pathway to solutions. Some challenges can already be tackled by techniques that are well known in academia but have not yet become regularly used in practice. Other challenges remain unresolved and require further research. This roadmap is intended to support future research and industrial collaboration.
Abstract:While deep ensembles are widely considered to be the default method for uncertainty quantification in deep learning, their effectiveness for graph-structured data is often simply assumed based on successes in domains like computer vision. We investigate standard deep ensembles specifically for message-passing graph neural networks. Benchmarking across seven datasets representing varied tasks and complexities, we reveal that ensembles provide surprisingly little improvement over a single model. Instead, the observed marginal gains stem primarily from stabilizing optimization noise in point predictions rather than yielding meaningfully better uncertainty estimates. Through an aleatoric-epistemic decomposition, we identify epistemic collapse: independently trained networks consistently converge to overly similar predictions. Because disagreement is the fundamental mechanism through which ensembles capture epistemic uncertainty, this lack of diversity neutralizes their key advantage. Analyzing this phenomenon further, we suggest this collapse is driven by functional rather than weight-space convexity, where distinct parameter solutions induce almost identical behavior. Our results suggest that deep ensemble success does not seamlessly transfer to graph machine learning.
Abstract:Unmanned Aerial Vehicle (UAV)-assisted networks are increasingly foreseen as a promising approach for emergency response, providing rapid, flexible, and resilient communications in environments where terrestrial infrastructure is degraded or unavailable. In such scenarios, voice radio communications remain essential for first responders due to their robustness; however, their unstructured nature prevents direct integration with automated UAV-assisted network management. This paper proposes SIREN, an AI-driven framework that enables voice-driven perception for UAV-assisted networks. By integrating Automatic Speech Recognition (ASR) with Large Language Model (LLM)-based semantic extraction and Natural Language Processing (NLP) validation, SIREN converts emergency voice traffic into structured, machine-readable information, including responding units, location references, emergency severity, and Quality-of-Service (QoS) requirements. SIREN is evaluated using synthetic emergency scenarios with controlled variations in language, speaker count, background noise, and message complexity. The results demonstrate robust transcription and reliable semantic extraction across diverse operating conditions, while highlighting speaker diarization and geographic ambiguity as the main limiting factors. These findings establish the feasibility of voice-driven situational awareness for UAV-assisted networks and show a practical foundation for human-in-the-loop decision support and adaptive network management in emergency response operations.
Abstract:Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for some classes of teacher-student problems, when the teacher and student networks have matching weights, showing that the smaller eigenvalues of the Hessian determine long-time learning performance. For linear networks, we analytically establish that for large networks the spectrum asymptotically follows a convolution of a scaled chi-square distribution with a scaled Marchenko-Pastur distribution. We numerically analyse the Hessian spectrum for polynomial and other non-linear networks. Furthermore, we show that the rank of the Hessian matrix can be seen as an effective number of parameters for networks using polynomial activation functions. For a generic non-linear activation function, such as the error function, we empirically observe that the Hessian matrix is always full rank.
Abstract:Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.




Abstract:This paper proposes FLUC, a modular framework that integrates open-source Large Language Models (LLMs) with Unmanned Aerial Vehicle (UAV) autopilot systems to enable autonomous control in Flying Networks (FNs). FLUC translates high-level natural language commands into executable UAV mission code, bridging the gap between operator intent and UAV behaviour. FLUC is evaluated using three open-source LLMs - Qwen 2.5, Gemma 2, and LLaMA 3.2 - across scenarios involving code generation and mission planning. Results show that Qwen 2.5 excels in multi-step reasoning, Gemma 2 balances accuracy and latency, and LLaMA 3.2 offers faster responses with lower logical coherence. A case study on energy-aware UAV positioning confirms FLUC's ability to interpret structured prompts and autonomously execute domain-specific logic, showing its effectiveness in real-time, mission-driven control.




Abstract:This paper describes use of model checking to verify synchronisation properties of an industrial welding system consisting of a cobot arm and an external turntable. The robots must move synchronously, but sometimes get out of synchronisation, giving rise to unsatisfactory weld qualities in problem areas, such as around corners. These mistakes are costly, since time is lost both in the robotic welding and in manual repairs needed to improve the weld. Verification of the synchronisation properties has shown that they are fulfilled as long as assumptions of correctness made about parts outside the scope of the model hold, indicating limitations in the hardware. These results have indicated the source of the problem, and motivated a re-calibration of the real-life system. This has drastically improved the welding results, and is a demonstration of how formal methods can be useful in an industrial setting.




Abstract:Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical for real-time applications. This paper introduces Deep-Graph-Sprints (DGS), a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements. We benchmark DGS against state-of-the-art feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while improving inference speed up to 12x compared to other deep learning approaches on our tested benchmarks. Our method effectively bridges the gap between deep representation learning and low-latency application requirements for CTDGs.




Abstract:Unmanned Aerial Vehicles (UAVs) are increasingly used to enable wireless communications. Due to their characteristics, such as the ability to hover and carry cargo, UAVs can serve as communications nodes, including Wi-Fi Access Points and Cellular Base Stations. In previous work, we proposed the Sustainable multi-UAV Performance-aware Placement (SUPPLY) algorithm, which focuses on the energy-efficient placement of multiple UAVs acting as Flying Access Points (FAPs). Additionally, we developed the Multi-UAV Energy Consumption (MUAVE) simulator to evaluate the UAV energy consumption, specifically when using the SUPPLY algorithm. However, MUAVE was initially designed to compute the energy consumption for rotary-wing UAVs only. In this paper, we propose eMUAVE, an enhanced version of the MUAVE simulator that allows the evaluation of the energy consumption for both rotary-wing and fixed-wing UAVs. Our energy consumption evaluation using eMUAVE considers reference and random networking scenarios. The results show that fixed-wing UAVs can be employed in the majority of networking scenarios. However, rotary-wing UAVs are typically more energy-efficient than fixed-wing UAVs when following the trajectories defined by SUPPLY.




Abstract:Unmanned Aerial Vehicles (UAVs) are used for a wide range of applications. Due to characteristics such as the ability to hover and carry cargo on-board, rotary-wing UAVs have been considered suitable platforms for carrying communications nodes, including Wi-Fi Access Points and cellular Base Stations. This gave rise to the concept of Flying Networks (FNs), now making part of the so-called Non-Terrestrial Networks (NTNs) defined in 3GPP. In scenarios where the deployment of terrestrial networks is not feasible, the use of FNs has emerged as a solution to provide wireless connectivity. However, the management of the communications resources in FNs imposes significant challenges, especially regarding the positioning of the UAVs so that the Quality of Service (QoS) offered to the Ground Users (GUs) and devices is maximized. Moreover, unlike terrestrial networks that are directly connected to the power grid, UAVs typically rely on on-board batteries that need to be recharged. In order to maximize the UAVs' flying time, the energy consumed by the UAVs needs to be minimized. When it comes to multi-UAV placement, most state-of-the-art solutions focus on maximizing the coverage area and assume that the UAVs keep hovering in a fixed position while serving GUs. Also, they do not address the energy-aware multi-UAV placement problem in networking scenarios where the GUs may have different QoS requirements and may not be uniformly distributed across the area of interest. In this work, we propose the Sustainable multi-UAV Performance-aware Placement (SUPPLY) algorithm. SUPPLY defines the energy and performance-aware positioning of multiple UAVs in an FN. To accomplish this, SUPPLY defines trajectories that minimize UAVs' energy consumption, while ensuring the targeted QoS levels. The obtained results show up to 25% energy consumption reduction with minimal impact on throughput and delay.