Abstract:Dynamic task assignment involves assigning arriving tasks to a limited number of resources in order to minimize the overall cost of the assignments. To achieve optimal task assignment, it is necessary to model the assignment problem first. While there exist separate formalisms, specifically Markov Decision Processes and (Colored) Petri Nets, to model, execute, and solve different aspects of the problem, there is no integrated modeling technique. To address this gap, this paper proposes Action-Evolution Petri Nets (A-E PN) as a framework for modeling and solving dynamic task assignment problems. A-E PN provides a unified modeling technique that can represent all elements of dynamic task assignment problems. Moreover, A-E PN models are executable, which means they can be used to learn close-to-optimal assignment policies through Reinforcement Learning (RL) without additional modeling effort. To evaluate the framework, we define a taxonomy of archetypical assignment problems. We show for three cases that A-E PN can be used to learn close-to-optimal assignment policies. Our results suggest that A-E PN can be used to model and solve a broad range of dynamic task assignment problems.
Abstract:Companies require modern capital assets such as wind turbines, trains and hospital equipment to experience minimal downtime. Ideally, assets are maintained right before failure to ensure maximum availability at minimum maintenance costs. To this end, two challenges arise: failure times of assets are unknown a priori and assets can be part of a larger asset network. Nowadays, it is common for assets to be equipped with real-time monitoring that emits alerts, typically triggered by the first signs of degradation. Thus, it becomes crucial to plan maintenance considering information received via alerts, asset locations and maintenance costs. This problem is referred to as the Dynamic Traveling Maintainer Problem with Alerts (DTMPA). We propose a modeling framework for the DTMPA, where the alerts are early and imperfect indicators of failures. The objective is to minimize discounted maintenance costs accrued over an infinite time horizon. We propose three methods to solve this problem, leveraging different information levels from the alert signals. The proposed methods comprise various greedy heuristics that rank assets based on proximity, urgency and economic risk; a Traveling Maintainer Heuristic employing combinatorial optimization to optimize near-future costs; a Deep Reinforcement Learning (DRL) method trained to minimize the long-term costs using exclusively the history of alerts. In a simulated environment, all methods can approximate optimal policies with access to perfect condition information for small asset networks. For larger networks, where computing the optimal policy is intractable, the proposed methods yield competitive maintenance policies, with DRL consistently achieving the lowest costs.
Abstract:Recent literature established that neural networks can represent good MDP policies across a range of stochastic dynamic models in supply chain and logistics. To overcome limitations of the model-free algorithms typically employed to learn/find such neural network policies, a model-based algorithm is proposed that incorporates variance reduction techniques. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while also outperforming heuristic benchmarks. The algorithm may be an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics.