Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. The efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is then explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to mutual learning algorithms; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. As aerodynamics derived from drag forces and moment variations are chaotic and difficult to precisely identify, most current quadrotor tracking systems treat them as simple `disturbances' in conventional control approaches. We propose a novel, interpretable trajectory tracker integrating a Distributional Reinforcement Learning disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). The proposed estimator `Constrained Distributional Reinforced disturbance estimator' (ConsDRED) accurately identifies uncertainties between true and estimated values of aerodynamic effects. Simplified Affine Disturbance Feedback is used for control parameterization to guarantee convexity, which we then integrate with a SMPC. We theoretically guarantee that ConsDRED achieves at least an optimal global convergence rate and a certain sublinear rate if constraints are violated with an error decreases as the width and the layer of neural network increase. To demonstrate practicality, we show convergent training in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL approaches. We demonstrate our system improves accumulative tracking errors by at least 62% compared with the recent art. Importantly, the proposed framework, ConsDRED-SMPC, balances the tradeoff between pursuing high performance and obeying conservative constraints for practical implementations
Accuracy and stability are common requirements for Quadrotor trajectory tracking systems. Designing an accurate and stable tracking controller remains challenging, particularly in unknown and dynamic environments with complex aerodynamic disturbances. We propose a Quantile-approximation-based Distributional-reinforced Uncertainty Estimator (QuaDUE) to accurately identify the effects of aerodynamic disturbances, i.e., the uncertainties between the true and estimated Control Contraction Metrics (CCMs). Taking inspiration from contraction theory and integrating the QuaDUE for uncertainties, our novel CCM-based trajectory tracking framework tracks any feasible reference trajectory precisely whilst guaranteeing exponential convergence. More importantly, the convergence and training acceleration of the distributional RL are guaranteed and analyzed, respectively, from theoretical perspectives. We also demonstrate our system under unknown and diverse aerodynamic forces. Under large aerodynamic forces (>2m/s^2), compared with the classic data-driven approach, our QuaDUE-CCM achieves at least a 56.6% improvement in tracking error. Compared with QuaDRED-MPC, a distributional RL-based approach, QuaDUE-CCM achieves at least a 3 times improvement in contraction rate.
This paper presents a novel trajectory tracker for autonomous quadrotor navigation in dynamic and complex environments. The proposed framework integrates a distributional Reinforcement Learning (RL) estimator for unknown aerodynamic effects into a Stochastic Model Predictive Controller (SMPC) for trajectory tracking. Aerodynamic effects derived from drag forces and moment variations are difficult to model directly and accurately. Most current quadrotor tracking systems therefore treat them as simple `disturbances' in conventional control approaches. We propose Quantile-approximation-based Distributional Reinforced-disturbance-estimator, an aerodynamic disturbance estimator, to accurately identify disturbances, i.e., uncertainties between the true and estimated values of aerodynamic effects. Simplified Affine Disturbance Feedback is employed for control parameterization to guarantee convexity, which we then integrate with a SMPC to achieve sufficient and non-conservative control signals. We demonstrate our system to improve the cumulative tracking errors by at least 66% with unknown and diverse aerodynamic forces compared with recent state-of-the-art. Concerning traditional Reinforcement Learning's non-interpretability, we provide convergence and stability guarantees of Distributional RL and SMPC, respectively, with non-zero mean disturbances.
The common layer-by-layer deposition of regular, 3-axis 3D printing simplifies both the fabrication process and the 3D printer's mechanical design. However, the resulting 3D printed objects have some unfavourable characteristics including visible layers, uneven structural strength and support material. To overcome these, researchers have employed robotic arms and multi-axis CNCs to deposit materials in conformal layers. Conformal deposition improves the quality of the 3D printed parts through support-less printing and curved layer deposition. However, such multi-axis 3D printing is inaccessible to many individuals due to high costs and technical complexities. Furthermore, the limited GUI support for conformal slicers creates an additional barrier for users. To open multi-axis 3D printing up to more makers and researchers, we present a cheap and accessible way to upgrade a regular 3D printer to 5 axes. We have also developed a GUI-based conformal slicer, integrated within a popular CAD package. Together, these deliver an accessible workflow for designing, simulating and creating conformally-printed 3D models.
Unmapped areas and aerodynamic disturbances render autonomous navigation with quadrotors extremely challenging. To fly safely and efficiently, trajectory planners and trackers must be able to navigate unknown environments with unpredictable aerodynamic effects in real-time. When encountering aerodynamic effects such as strong winds, most current approaches to quadrotor trajectory planning and tracking will not attempt to deviate from a determined plan, even if it is risky, in the hope that any aerodynamic disturbances can be resisted by a robust controller. This paper presents a novel systematic trajectory planning and tracking framework for autonomous quadrotors. We propose a Kinodynamic Jump Space Search (Kino-JSS) to generate a safe and efficient route in unknown environments with aerodynamic disturbances. A real-time Gaussian Process is employed to model the effects of aerodynamic disturbances, which we then integrate with a Model Predictive Controller to achieve efficient and accurate trajectory optimization and tracking. We demonstrate our system to improve the efficiency of trajectory generation in unknown environments by up to 75\% in the cases tested, compared with recent state-of-the-art. We also demonstrate that our system improves the accuracy of tracking in selected environments with unpredictable aerodynamic effects.
Recharging Internet of Things devices using autonomous robots is an attractive maintenance solution. Ensuring efficient and reliable performance of autonomous power delivery vehicles is very challenging in dynamic environments. Our work considers a hybrid Travelling Salesman Problem and Orienteering Problem scenario where the optimization objective is to jointly minimize discharged energy of the power delivery vehicle and maximize recharged energy of devices. This is decomposed as an NP-hard nonconvex optimization and nonlinear integer programming problem. Many studies have demonstrated satisfactory performance of heuristic algorithms' ability to solve specific routing problems, however very few studies explore online updating (i.e., mission re-planning `on the fly') for such hybrid scenarios. In this paper, we present a novel lightweight and reliable mission planner that solves the problem by combining offline search and online reevaluation. We propose Rapid Online Metaheuristic-based Planner, ROMP, a multi-objective offline and online mission planner that can incorporate real-time state information from the power delivery vehicle and its local environment to deliver reliable, up-to-date and near-optimal mission planning. We supplement Guided Local Search (via Google OR-Tools) with a Black Hole-inspired algorithm. Our results show that the proposed solver improves the solution quality offered by Guided Local Search in most of the cases tested. We also demonstrate latency performance improvements by applying a parallelization strategy.