Alert button
Picture for Yexiang Xue

Yexiang Xue

Alert button

Hypothesis Network Planned Exploration for Rapid Meta-Reinforcement Learning Adaptation

Nov 07, 2023
Maxwell Joseph Jacobson, Yexiang Xue

Meta Reinforcement Learning (Meta RL) trains agents that adapt to fast-changing environments and tasks. Current strategies often lose adaption efficiency due to the passive nature of model exploration, causing delayed understanding of new transition dynamics. This results in particularly fast-evolving tasks being impossible to solve. We propose a novel approach, Hypothesis Network Planned Exploration (HyPE), that integrates an active and planned exploration process via the hypothesis network to optimize adaptation speed. HyPE uses a generative hypothesis network to form potential models of state transition dynamics, then eliminates incorrect models through strategically devised experiments. Evaluated on a symbolic version of the Alchemy game, HyPE outpaces baseline methods in adaptation speed and model accuracy, validating its potential in enhancing reinforcement learning adaptation in rapidly evolving settings.

Viaarxiv icon

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

Oct 13, 2023
Maxwell Joseph Jacobson, Yexiang Xue

Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs, but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module decides the locations of objects to be generated in the form of bounding boxes, which are predicted by a recurrent neural network and filtered by symbolic constraint satisfaction. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfies user requirements. Furthermore, SPRING offers interpretability, allowing users to visualize and diagnose the generation process through the bounding boxes. SPRING is also adept at managing novel user specifications not encountered during its training, thanks to its proficiency in zero-shot constraint transfer. Quantitative evaluations and a human study reveal that SPRING outperforms baseline generative models, excelling in delivering high design quality and better meeting user specifications.

Viaarxiv icon

Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees

Sep 16, 2023
Jinzhao Li, Nan Jiang, Yexiang Xue

Figure 1 for Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees
Figure 2 for Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees
Figure 3 for Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees
Figure 4 for Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees

Satisfiability Modulo Counting (SMC) encompasses problems that require both symbolic decision-making and statistical reasoning. Its general formulation captures many real-world problems at the intersection of symbolic and statistical Artificial Intelligence. SMC searches for policy interventions to control probabilistic outcomes. Solving SMC is challenging because of its highly intractable nature($\text{NP}^{\text{PP}}$-complete), incorporating statistical inference and symbolic reasoning. Previous research on SMC solving lacks provable guarantees and/or suffers from sub-optimal empirical performance, especially when combinatorial constraints are present. We propose XOR-SMC, a polynomial algorithm with access to NP-oracles, to solve highly intractable SMC problems with constant approximation guarantees. XOR-SMC transforms the highly intractable SMC into satisfiability problems, by replacing the model counting in SMC with SAT formulae subject to randomized XOR constraints. Experiments on solving important SMC problems in AI for social good demonstrate that XOR-SMC finds solutions close to the true optimum, outperforming several baselines which struggle to find good approximations for the intractable model counting in SMC.

Viaarxiv icon

Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains

Sep 13, 2023
Md Nasim, Yexiang Xue

Figure 1 for Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Figure 2 for Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Figure 3 for Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains

Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.

Viaarxiv icon

Racing Control Variable Genetic Programming for Symbolic Regression

Sep 13, 2023
Nan Jiang, Yexiang Xue

Figure 1 for Racing Control Variable Genetic Programming for Symbolic Regression
Figure 2 for Racing Control Variable Genetic Programming for Symbolic Regression
Figure 3 for Racing Control Variable Genetic Programming for Symbolic Regression
Figure 4 for Racing Control Variable Genetic Programming for Symbolic Regression

Symbolic regression, as one of the most crucial tasks in AI for science, discovers governing equations from experimental data. Popular approaches based on genetic programming, Monte Carlo tree search, or deep reinforcement learning learn symbolic regression from a fixed dataset. They require massive datasets and long training time especially when learning complex equations involving many variables. Recently, Control Variable Genetic Programming (CVGP) has been introduced which accelerates the regression process by discovering equations from designed control variable experiments. However, the set of experiments is fixed a-priori in CVGP and we observe that sub-optimal selection of experiment schedules delay the discovery process significantly. To overcome this limitation, we propose Racing Control Variable Genetic Programming (Racing-CVGP), which carries out multiple experiment schedules simultaneously. A selection scheme similar to that used in selecting good symbolic equations in the genetic programming process is implemented to ensure that promising experiment schedules eventually win over the average ones. The unfavorable schedules are terminated early to save time for the promising ones. We evaluate Racing-CVGP on several synthetic and real-world datasets corresponding to true physics laws. We demonstrate that Racing-CVGP outperforms CVGP and a series of symbolic regressors which discover equations from fixed datasets.

Viaarxiv icon

Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning

Aug 29, 2023
Md Masudur Rahman, Yexiang Xue

Figure 1 for Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning
Figure 2 for Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning
Figure 3 for Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning
Figure 4 for Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning

This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features. Our approach consists of a max-min game theoretic objective. A generator transfers the style of observation during reinforcement learning. An additional goal of the generator is to perturb the observation, which maximizes the agent's probability of taking a different action. In contrast, a policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward. Based on this setup, we propose a practical deep reinforcement learning algorithm, Adversarial Robust Policy Optimization (ARPO), to find a robust policy that generalizes to unseen environments. We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency. Empirically, ARPO shows improved performance compared to a few baseline algorithms, including data augmentation.

Viaarxiv icon

Adversarial Policy Optimization in Deep Reinforcement Learning

Apr 27, 2023
Md Masudur Rahman, Yexiang Xue

Figure 1 for Adversarial Policy Optimization in Deep Reinforcement Learning
Figure 2 for Adversarial Policy Optimization in Deep Reinforcement Learning
Figure 3 for Adversarial Policy Optimization in Deep Reinforcement Learning
Figure 4 for Adversarial Policy Optimization in Deep Reinforcement Learning

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the agent struggles to learn a useful policy. Data augmentation can provide a performance boost to RL agents by mitigating the effect of overfitting. However, such data augmentation is a form of prior knowledge, and naively applying them in environments might worsen an agent's performance. In this paper, we propose a novel RL algorithm to mitigate the above issue and improve the efficiency of the learned policy. Our approach consists of a max-min game theoretic objective where a perturber network modifies the state to maximize the agent's probability of taking a different action while minimizing the distortion in the state. In contrast, the policy network updates its parameters to minimize the effect of perturbation while maximizing the expected future reward. Based on this objective, we propose a practical deep reinforcement learning algorithm, Adversarial Policy Optimization (APO). Our method is agnostic to the type of policy optimization, and thus data augmentation can be incorporated to harness the benefit. We evaluated our approaches on several DeepMind Control robotic environments with high-dimensional and noisy state settings. Empirical results demonstrate that our method APO consistently outperforms the state-of-the-art on-policy PPO agent. We further compare our method with state-of-the-art data augmentation, RAD, and regularization-based approach DRAC. Our agent APO shows better performance compared to these baselines.

Viaarxiv icon

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

Feb 02, 2023
Md Masudur Rahman, Yexiang Xue

Figure 1 for Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning
Figure 2 for Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning
Figure 3 for Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning
Figure 4 for Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

This paper investigates the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods in reinforcement learning. Our approach is to estimate the value function from prior computations, such as from the Q-network learned in DQN or the value function trained for different but related environments. In particular, we learn a new value function for the target task while combining it with a value estimate from the prior computation. Finally, the resulting value function is used as a baseline in the policy gradient method. This use of a baseline has the theoretical property of reducing variance in gradient computation and thus improving sample efficiency. The experiments show the successful use of prior value estimates in various settings and improved sample efficiency in several tasks.

Viaarxiv icon

Robust Policy Optimization in Deep Reinforcement Learning

Dec 14, 2022
Md Masudur Rahman, Yexiang Xue

Figure 1 for Robust Policy Optimization in Deep Reinforcement Learning
Figure 2 for Robust Policy Optimization in Deep Reinforcement Learning
Figure 3 for Robust Policy Optimization in Deep Reinforcement Learning
Figure 4 for Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of exploration, resulting from the variance of the representing distribution. Entropy can play an essential role in policy optimization by selecting the stochastic policy, which eventually helps better explore the environment in reinforcement learning (RL). However, the stochasticity often reduces as the training progresses; thus, the policy becomes less exploratory. Additionally, certain parametric distributions might only work for some environments and require extensive hyperparameter tuning. This paper aims to mitigate these issues. In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution. We hypothesize that our method encourages high-entropy actions and provides a way to represent the action space better. We further provide empirical evidence to verify our hypothesis. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym. We observed that in many settings, RPO increases the policy entropy early in training and then maintains a certain level of entropy throughout the training period. Eventually, our agent RPO shows consistently improved performance compared to PPO and other techniques: entropy regularization, different distributions, and data augmentation. Furthermore, in several settings, our method stays robust in performance, while other baseline mechanisms fail to improve and even worsen the performance.

Viaarxiv icon

Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma

Dec 02, 2022
Nan Jiang, Yi Gu, Yexiang Xue

Figure 1 for Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma
Figure 2 for Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma
Figure 3 for Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma
Figure 4 for Learning Combinatorial Structures via Markov Random Fields with Sampling through Lovász Local Lemma

Generative models for learning combinatorial structures have transformative impacts in many applications. However, existing approaches fail to offer efficient and accurate learning results. Because of the highly intractable nature of the gradient estimation of the learning objective subject to combinatorial constraints. Existing gradient estimation methods would easily run into exponential time/memory space, or incur huge estimation errors due to improper approximation. We develop NEural Lovasz Sampler (Nelson), a neural network based on Lov\'asz Local Lemma (LLL). We show it guarantees to generate samples satisfying combinatorial constraints from the distribution of the constrained Markov Random Fields model (MRF) under certain conditions. We further present a fully differentiable contrastive-divergence-based learning framework on constrained MRF (Nelson-CD). Meanwhile, Nelson-CD being fully differentiable allows us to take advantage of the parallel computing power of GPUs, resulting in great efficiency. Experimental results on three real-world combinatorial problems reveal that Nelson learns to generate 100% valid structures. In comparison, baselines either time out on large-size data sets or fail to generate valid structures, whereas Nelson scales much better with problem size. In addition, Nelson outperforms baselines in various learning metrics, such as log-likelihood and MAP scores.

* accepted by AAAI 2023. The first two authors contribute equally 
Viaarxiv icon