Abstract:In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective.




Abstract:This paper introduces a new routing problem referred to as the vehicle routing problem with vector profits. Given a network composed of nodes (depot/sites) and arcs connecting the nodes, the problem determines routes that depart from the depot, visit sites to collect profits, and return to the depot. There are multiple stakeholders interested in the mission and each site is associated with a vector whose k-th element represents the profit value for the k-th stakeholder. The objective of the problem is to maximize the profit sum for the least satisfied stakeholder, i.e., the stakeholder with the smallest total profit value. An approach based on the linear programming relaxation and column-generation to solve this max-min type routing problem was developed. Two cases studies - the planetary surface exploration and the Rome tour cases - were presented to demonstrate the effectiveness of the proposed problem formulation and solution methodology.