Abstract:In recent years, Deep Reinforcement Learning (DRL) has achieved substantial progress on Vehicle Routing Problems (VRPs). However, existing DRL-based methods are typically trained on instances generated from a uniform distribution, which limits their performance under real-world distribution shifts. In this paper, we aim to develop a generalization-oriented model that partitions the policy network into multiple modules and adaptively recombines modules to form specific policies during inference. Specifically, we propose Residual Refined Experts with Instance-level Gating (R2E-IG) to improve cross-distribution generalization. Our contributions are threefold: (1) We introduce a Residual Refined Expert (R2E) architecture that enhance expert expressiveness via residual refinement; (2) We design an instance-level gating mechanism that learns distribution-aware instance representations and routes inputs to suitable modules; (3) We propose a mixed-distribution training mechanism equipped with Dynamic Weight Adaption (DWA), which dynamically reweights training data from different distributions to emphasize more informative ones. Extensive experiments show that R2E-IG achieves competitive performance against state-of-the-art baselines on both in-distribution and out-of-distribution instances across synthetic and benchmark datasets. Moreover, R2E-IG is generic and can be easily integrated into existing DRL-based methods to further improve performance.
Abstract:The Maximal Covering Location-Interdiction Problem (MCLIP) is a classic bi-level optimization problem, which is fundamental to resilient infrastructure planning yet remains computationally intractable. Specifically, the upper level determines facility locations to maximize coverage, while the lower level executes worst-case interdiction to minimize the coverage. The strong coupling between the upper and lower levels, combined with their respective high combinatorial complexity, renders traditional methods ineffective. To bridge this gap, we propose a Dual-Agent Deep Reinforcement Learning (DADRL) framework based on adversarial learning, comprising a location agent corresponding to the upper level and an interdiction agent corresponding to the lower level. Our contributions are threefold: (1) The location agent is trained simultaneously against an evolving interdiction agent, making it effectively capture the dynamic competitive interplay between the upper and lower levels; (2) To fully exploit the learned capabilities of the interdiction agent, we propose a Surrogate-based Ensemble Inference Strategy that utilizes the trained interdiction agent as a high-fidelity surrogate to guide the decisions of location agent; (3) Extensive experiments on synthetic and real-world datasets demonstrate that our approach achieves superior computational efficiency while maintaining highly competitive solution quality compared to other baselines. Furthermore, our DADRL framework is model-agnostic to network structures, while its underlying adversarial learning paradigm demonstrates strong potential for solving other bi-level optimization problems.




Abstract:Coverage optimization generally involves deploying a set of facilities (e.g., sensors) to best satisfy the demands of specified points, with wide applications in fields such as location science and sensor networks. In practical applications, coverage optimization focuses on target coverage, which is typically formulated as Mixed-Variable Optimization Problems (MVOPs) due to complex real-world constraints. Meanwhile, high-fidelity discretization and visibility analysis may bring additional calculations, which significantly increases the computational cost. These factors pose significant challenges for fitness evaluations (FEs) in canonical Evolutionary Algorithms (EAs), and evolve the coverage problem into an Expensive Mixed-Variable Optimization Problem (EMVOP). To address these issues, we propose the RankNet-Inspired Surrogate-assisted Hybrid Metaheuristic (RI-SHM), an extension of our previous work. RI-SHM integrates three key components: (1) a RankNet-based pairwise global surrogate that innovatively predicts rankings between pairs of individuals, bypassing the challenges of fitness estimation in discontinuous solution space; (2) a surrogate-assisted local Estimation of Distribution Algorithm (EDA) that enhances local exploitation and helps escape from local optima; and (3) a fitness diversity-driven switching strategy that dynamically balances exploration and exploitation. Experiments demonstrate that our algorithm can effectively handle large-scale coverage optimization tasks of up to 300 dimensions and more than 1,800 targets within desirable runtime. Compared to state-of-the-art algorithms for EMVOPs, RI-SHM consistently outperforms them by up to 56.5$\%$ across all tested instances.