This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP.
Planning high-speed trajectories for UAVs in unknown environments requires algorithmic techniques that enable fast reaction times to guarantee safety as more information about the environment becomes available. The standard approach to ensure safety is to enforce a "stop" condition in the free-known space. However, this can severely limit the speed of the vehicle, especially in situations where much of the world is unknown. Moreover, the ad-hoc time and interval allocation scheme usually imposed on the trajectory also leads to conservative and slower trajectories. This work proposes FASTER (Fast and Safe Trajectory Planner) to ensure safety without sacrificing speed. FASTER obtains high-speed trajectories by enabling the local planner to optimize in both the free-known and unknown spaces. Safety guarantees are ensured by always having a feasible, safe back-up trajectory in the free-known space at the start of each replanning step. The Mixed Integer Quadratic Program formulation proposed allows the solver to choose the trajectory interval allocation, and the time allocation is found by a line search algorithm initialized with a heuristic computed from the previous replanning iteration. This proposed algorithm is tested extensively both in simulation and in real hardware, showing agile flights in unknown cluttered environments with velocities up to 7.8 m/s. To demonstrate the generality of the proposed framework, FASTER is also applied to a skid-steer robot, and the maximum speed specified for the robot (2 m/s) is achieved in real hardware experiments.
This paper presents the first certifiably correct solver for distributed rotation and pose synchronization, the backbone of modern collaborative simultaneous localization and mapping (CSLAM) and camera network localization (CNL) systems. By pursuing a sparse semidefinite relaxation, our approach provides formal performance guarantees that match the state of the art in the centralized setting. In particular, we prove that under "low" noise, the solution to the semidefinite relaxation is guaranteed to provide a globally optimal solution to the original non-convex problem. To solve the resulting large-scale semidefinite programs, we adopt the state-of-the-art Riemannian Staircase framework and develop Riemannian block-coordinate descent (RBCD) as the core distributed local search algorithm. RBCD is well-suited to distributed synchronization problems as it only requires local communication, provides privacy protection, and is easily parallelizable. Furthermore, we prove that RBCD converges to first-order critical points for general Riemannian optimization problems over product of matrix submanifolds, with a global sublinear convergence rate. Extensive evaluations on real and synthetic datasets demonstrate that the proposed solver correctly recovers globally optimal solutions under low-to-moderate noise, and outperforms alternative distributed techniques in terms of solution precision and convergence speed.
This paper presents a novel incremental learning algorithm for pedestrian motion prediction, with the ability to improve the learned model over time when data is incrementally available. In this setup, trajectories are modeled as simple segments called motion primitives. Transitions between motion primitives are modeled as Gaussian Processes. When new data is available, the motion primitives learned from the new data are compared with the previous ones by measuring the inner product of the motion primitive vectors. Similar motion primitives and transitions are fused and novel motion primitives are added to capture newly observed behaviors. The proposed approach is tested and compared with other baselines in intersection scenarios where the data is incrementally available either from a single intersection or from multiple intersections with different geometries. In both cases, our method incrementally learns motion patterns and outperforms the offline learning approach in terms of prediction errors. The results also show that the model size in our algorithm grows at a much lower rate than standard incremental learning, where newly learned motion primitives and transitions are simply accumulated over time.
Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was already shown to cause an autonomous vehicle to swerve into oncoming traffic. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certified defense for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose the optimal action under a worst-case deviation in input space due to possible adversaries or noise. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task.
This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.
Collision avoidance algorithms are essential for safe and efficient robot operation among pedestrians. This work proposes using deep reinforcement (RL) learning as a framework to model the complex interactions and cooperation with nearby, decision-making agents (e.g., pedestrians, other robots). Existing RL-based works assume homogeneity of agent policies, use specific motion models over short timescales, or lack a mechanism to consider measurements taken with a large number (possibly varying) of nearby agents. Therefore, this work develops an algorithm that learns collision avoidance among a variety of types of non-communicating, dynamic agents without assuming they follow any particular behavior rules. It extends our previous work by introducing a strategy using Long Short-Term Memory (LSTM) that enables the algorithm to use observations of an arbitrary number of other agents, instead of a small, fixed number of neighbors. The proposed algorithm is shown to outperform a classical collision avoidance algorithm, another deep RL-based algorithm, and scales with the number of agents better (fewer collisions, shorter time to goal) than our previously published learning-based approach. Analysis of the LSTM provides insights into how observations of nearby agents affect the hidden state and quantifies the performance impact of various agent ordering heuristics. The learned policy generalizes to several applications beyond the training scenarios: formation control (arrangement into letters), an implementation on a fleet of four multirotors, and an implementation on a fully autonomous robotic vehicle capable of traveling at human walking speed among pedestrians.
A common approach for defining a reward function for Multi-objective Reinforcement Learning (MORL) problems is the weighted sum of the multiple objectives. The weights are then treated as design parameters dependent on the expertise (and preference) of the person performing the learning, with the typical result that a new solution is required for any change in these settings. This paper investigates the relationship between the reward function and the optimal value function for MORL; specifically addressing the question of how to approximate the optimal value function well beyond the set of weights for which the optimization problem was actually solved, thereby avoiding the need to recompute for any particular choice. We prove that the value function transforms smoothly given a transformation of weights of the reward function (and thus a smooth interpolation in the policy space). A Gaussian process is used to obtain a smooth interpolation over the reward function weights of the optimal value function for three well-known examples: GridWorld, Objectworld and Pendulum. The results show that the interpolation can provide very robust values for sample states and action space in discrete and continuous domain problems. Significant advantages arise from utilizing this interpolation technique in the domain of autonomous vehicles: easy, instant adaptation of user preferences while driving and true randomization of obstacle vehicle behavior preferences during training.