This paper proposes a data-driven method for learning convergent control policies from offline data using Contraction theory. Contraction theory enables constructing a policy that makes the closed-loop system trajectories inherently convergent towards a unique trajectory. At the technical level, identifying the contraction metric, which is the distance metric with respect to which a robot's trajectories exhibit contraction is often non-trivial. We propose to jointly learn the control policy and its corresponding contraction metric while enforcing contraction. To achieve this, we learn an implicit dynamics model of the robotic system from an offline data set consisting of the robot's state and input trajectories. Using this learned dynamics model, we propose a data augmentation algorithm for learning contraction policies. We randomly generate samples in the state-space and propagate them forward in time through the learned dynamics model to generate auxiliary sample trajectories. We then learn both the control policy and the contraction metric such that the distance between the trajectories from the offline data set and our generated auxiliary sample trajectories decreases over time. We evaluate the performance of our proposed framework on simulated robotic goal-reaching tasks and demonstrate that enforcing contraction results in faster convergence and greater robustness of the learned policy.
We study the problem of persistent monitoring of finite number of inter-connected geographical nodes for event detection via a group of heterogeneous mobile agents. We use Poisson process model to capture the probability of the events occurring at the geographical nodes. We then tie a utility function to the probability of detecting an event in each point of interest and use it in our policy design to incentivize the agents to visit the geographical nodes with higher probability of event occurrence. We show that the design of an optimal monitoring policy to maximize the utility of event detection over a mission horizon is an NP-hard problem. By showing that the reward function is a monotone increasing and submodular function, we then proceed to propose a suboptimal dispatch policy design with a known optimality gap. To reduce the time complexity of constructing the feasible search set and also to induce robustness to changes in event occurrence and other operational factors, we preform our suboptimal policy design in a receding horizon setting. Our next contribution is to add a new term to our optimization problem to compensate for the shortsightedness of the receding horizon approach. This added term provides a measure of importance for nodes beyond the receding horizon's sight, and is meant to give the policy design an intuition to steer the agents towards areas with higher importance on the global map. Finally, we discuss how our proposed algorithm can be implemented in a decentralized manner. We demonstrate our results through a simulation study.