Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB) scores that provide an efficient strategy in the branch-and-bound algorithm. Although message-passing GNN (MP-GNN), as the simplest GNN structure, is frequently employed in the existing literature to learn SB scores, we prove a fundamental limitation in its expressive power -- there exist two MILP instances with different SB scores that cannot be distinguished by any MP-GNN, regardless of the number of parameters. In addition, we establish a universal approximation theorem for another GNN structure called the second-order folklore GNN (2-FGNN). We show that for any data distribution over MILPs, there always exists a 2-FGNN that can approximate the SB score with arbitrarily high accuracy and arbitrarily high probability. A small-scale numerical experiment is conducted to directly validate our theoretical findings.
n clinical, if a patient presents with nonmechanical obstructive dysphagia, esophageal chest pain, and gastro esophageal reflux symptoms, the physician will usually assess the esophageal dynamic function. High-resolution manometry (HRM) is a clinically commonly used technique for detection of esophageal dynamic function comprehensively and objectively. However, after the results of HRM are obtained, doctors still need to evaluate by a variety of parameters. This work is burdensome, and the process is complex. We conducted image processing of HRM to predict the esophageal contraction vigor for assisting the evaluation of esophageal dynamic function. Firstly, we used Feature-Extraction and Histogram of Gradients (FE-HOG) to analyses feature of proposal of swallow (PoS) to further extract higher-order features. Then we determine the classification of esophageal contraction vigor normal, weak and failed by using linear-SVM according to these features. Our data set includes 3000 training sets, 500 validation sets and 411 test sets. After verification our accuracy reaches 86.83%, which is higher than other common machine learning methods.
Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)
Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.
Mixed-integer linear programming (MILP) stands as a notable NP-hard problem pivotal to numerous crucial industrial applications. The development of effective algorithms, the tuning of solvers, and the training of machine learning models for MILP resolution all hinge on access to extensive, diverse, and representative data. Yet compared to the abundant naturally occurring data in image and text realms, MILP is markedly data deficient, underscoring the vital role of synthetic MILP generation. We present DIG-MILP, a deep generative framework based on variational auto-encoder (VAE), adept at extracting deep-level structural features from highly limited MILP data and producing instances that closely mirror the target data. Notably, by leveraging the MILP duality, DIG-MILP guarantees a correct and complete generation space as well as ensures the boundedness and feasibility of the generated instances. Our empirical study highlights the novelty and quality of the instances generated by DIG-MILP through two distinct downstream tasks: (S1) Data sharing, where solver solution times correlate highly positive between original and DIG-MILP-generated instances, allowing data sharing for solver tuning without publishing the original data; (S2) Data Augmentation, wherein the DIG-MILP-generated instances bolster the generalization performance of machine learning models tasked with resolving MILP problems.
Procedurally generating cities in Minecraft provides players more diverse scenarios and could help understand and improve the design of cities in other digital worlds and the real world. This paper presents a city generator that was submitted as an entry to the 2023 Edition of Minecraft Settlement Generation Competition for Minecraft. The generation procedure is composed of six main steps, namely vegetation clearing, terrain reshaping, building layout generation, route planning, streetlight placement, and wall construction. Three algorithms, including a heuristic-based algorithm, an evolving layout algorithm, and a random one are applied to generate the building layout, thus determining where to place different redstone style buildings, and tested by generating cities on random maps in limited time. Experimental results show that the heuristic-based algorithm is capable of finding an acceptable building layout faster for flat maps, while the evolving layout algorithm performs better in evolving layout for rugged maps. A user study is conducted to compare our generator with outstanding entries of the competition's 2022 edition using the competition's evaluation criteria and shows that our generator performs well in the adaptation and functionality criteria
Learning to Optimize (L2O), a technique that utilizes machine learning to learn an optimization algorithm automatically from data, has gained arising attention in recent years. A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network. While the generic approach is widely applicable, the learned model can overfit and may not generalize well to out-of-distribution test sets. In this paper, we derive the basic mathematical conditions that successful update rules commonly satisfy. Consequently, we propose a novel L2O model with a mathematics-inspired structure that is broadly applicable and generalized well to out-of-distribution problems. Numerical simulations validate our theoretical findings and demonstrate the superior empirical performance of the proposed L2O model.
As one of the core parts of flexible manufacturing systems, material handling involves storage and transportation of materials between workstations with automated vehicles. The improvement in material handling can impulse the overall efficiency of the manufacturing system. However, the occurrence of dynamic events during the optimisation of task arrangements poses a challenge that requires adaptability and effectiveness. In this paper, we aim at the scheduling of automated guided vehicles for dynamic material handling. Motivated by some real-world scenarios, unknown new tasks and unexpected vehicle breakdowns are regarded as dynamic events in our problem. We formulate the problem as a constrained Markov decision process which takes into account tardiness and available vehicles as cumulative and instantaneous constraints, respectively. An adaptive constrained reinforcement learning algorithm that combines Lagrangian relaxation and invalid action masking, named RCPOM, is proposed to address the problem with two hybrid constraints. Moreover, a gym-like dynamic material handling simulator, named DMH-GYM, is developed and equipped with diverse problem instances, which can be used as benchmarks for dynamic material handling. Experimental results on the problem instances demonstrate the outstanding performance of our proposed approach compared with eight state-of-the-art constrained and non-constrained reinforcement learning algorithms, and widely used dispatching rules for material handling.
For solving combinatorial optimisation problems with metaheuristics, different search operators are applied for sampling new solutions in the neighbourhood of a given solution. It is important to understand the relationship between operators for various purposes, e.g., adaptively deciding when to use which operator to find optimal solutions efficiently. However, it is difficult to theoretically analyse this relationship, especially in the complex solution space of combinatorial optimisation problems. In this paper, we propose to empirically analyse the relationship between operators in terms of the correlation between their local optima and develop a measure for quantifying their relationship. The comprehensive analyses on a wide range of capacitated vehicle routing problem benchmark instances show that there is a consistent pattern in the correlation between commonly used operators. Based on this newly proposed local optima correlation metric, we propose a novel approach for adaptively selecting among the operators during the search process. The core intention is to improve search efficiency by preventing wasting computational resources on exploring neighbourhoods where the local optima have already been reached. Experiments on randomly generated instances and commonly used benchmark datasets are conducted. Results show that the proposed approach outperforms commonly used adaptive operator selection methods.
Games have been the perfect test-beds for artificial intelligence research for the characteristics that widely exist in real-world scenarios. Learning and optimisation, decision making in dynamic and uncertain environments, game theory, planning and scheduling, design and education are common research areas shared between games and real-world problems. Numerous open-sourced games or game-based environments have been implemented for studying artificial intelligence. In addition to single- or multi-player, collaborative or adversarial games, there has also been growing interest in implementing platforms for creative design in recent years. Those platforms provide ideal benchmarks for exploring and comparing artificial intelligence ideas and techniques. This paper reviews the game-based platforms for artificial intelligence research, discusses the research trend induced by the evolution of those platforms, and gives an outlook.