Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianye Hao

GFlowNets with Human Feedback

May 11, 2023

Yinchuan Li, Shuang Luo, Yunfeng Shao, Jianye Hao

Figure 1 for GFlowNets with Human Feedback

Figure 2 for GFlowNets with Human Feedback

Figure 3 for GFlowNets with Human Feedback

Figure 4 for GFlowNets with Human Feedback

Abstract:We propose the GFlowNets with Human Feedback (GFlowHF) framework to improve the exploration ability when training AI models. For tasks where the reward is unknown, we fit the reward function through human evaluations on different trajectories. The goal of GFlowHF is to learn a policy that is strictly proportional to human ratings, instead of only focusing on human favorite ratings like RLHF. Experiments show that GFlowHF can achieve better exploration ability than RLHF.

Via

Access Paper or Ask Questions

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

May 09, 2023

Jiajun Fan, Yuzheng Zhuang, Yuecheng Liu, Jianye Hao, Bin Wang, Jiangcheng Zhu, Hao Wang, Shu-Tao Xia

Figure 1 for Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Figure 2 for Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Figure 3 for Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Figure 4 for Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

Abstract:The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of different exploratory policies. Adaptive policy selection has been adopted for behavior control. However, the behavior selection space is largely limited by the predefined policy population, which further limits behavior diversity. In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified learnable process for behavior selection. We introduce LBC into distributed off-policy actor-critic methods and achieve behavior control via optimizing the selection of the behavior mappings with bandit-based meta-controllers. Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames in the Arcade Learning Environment, which demonstrates our significant state-of-the-art (SOTA) performance without degrading the sample efficiency.

Via

Access Paper or Ask Questions

Generalized Universal Domain Adaptation with Generative Flow Networks

May 08, 2023

Didi Zhu, Yinchuan Li, Yunfeng Shao, Jianye Hao, Fei Wu, Kun Kuang, Jun Xiao, Chao Wu

Abstract:We introduce a new problem in unsupervised domain adaptation, termed as Generalized Universal Domain Adaptation (GUDA), which aims to achieve precise prediction of all target labels including unknown categories. GUDA bridges the gap between label distribution shift-based and label space mismatch-based variants, essentially categorizing them as a unified problem, guiding to a comprehensive framework for thoroughly solving all the variants. The key challenge of GUDA is developing and identifying novel target categories while estimating the target label distribution. To address this problem, we take advantage of the powerful exploration capability of generative flow networks and propose an active domain adaptation algorithm named GFlowDA, which selects diverse samples with probabilities proportional to a reward function. To enhance the exploration capability and effectively perceive the target label distribution, we tailor the states and rewards, and introduce an efficient solution for parent exploration and state transition. We also propose a training paradigm for GUDA called Generalized Universal Adversarial Network (GUAN), which involves collaborative optimization between GUAN and GFlowNet. Theoretical analysis highlights the importance of exploration, and extensive experiments on benchmark datasets demonstrate the superiority of GFlowDA.

Via

Access Paper or Ask Questions

Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

May 02, 2023

Yuening Wang, Yingxue Zhang, Antonios Valkanas, Ruiming Tang, Chen Ma, Jianye Hao, Mark Coates

Figure 1 for Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

Figure 2 for Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

Figure 3 for Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

Figure 4 for Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

Abstract:Recommender systems now consume large-scale data and play a significant role in improving user experience. Graph Neural Networks (GNNs) have emerged as one of the most effective recommender system models because they model the rich relational information. The ever-growing volume of data can make training GNNs prohibitively expensive. To address this, previous attempts propose to train the GNN models incrementally as new data blocks arrive. Feature and structure knowledge distillation techniques have been explored to allow the GNN model to train in a fast incremental fashion while alleviating the catastrophic forgetting problem. However, preserving the same amount of the historical information for all users is sub-optimal since it fails to take into account the dynamics of each user's change of preferences. For the users whose interests shift substantially, retaining too much of the old knowledge can overly constrain the model, preventing it from quickly adapting to the users' novel interests. In contrast, for users who have static preferences, model performance can benefit greatly from preserving as much of the user's long-term preferences as possible. In this work, we propose a novel training strategy that adaptively learns personalized imitation weights for each user to balance the contribution from the recent data and the amount of knowledge to be distilled from previous time periods. We demonstrate the effectiveness of learning imitation weights via a comparison on five diverse datasets for three state-of-art structure distillation based recommender systems. The performance shows consistent improvement over competitive incremental learning techninques.

Via

Access Paper or Ask Questions

Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Apr 24, 2023

Yinchuan Li, Zhigang Li, Wenqian Li, Yunfeng Shao, Yan Zheng, Jianye Hao

Figure 1 for Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Figure 2 for Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Figure 3 for Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Figure 4 for Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

Abstract:Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions. However, these algorithms struggle to learn policy distributions that are proportional to rewards and have limited exploration capabilities. In this paper, we innovatively formulate the graph active learning problem as a generative process, named GFlowGNN, which generates various samples through sequential actions with probabilities precisely proportional to a predefined reward function. Furthermore, we propose the concept of flow nodes and flow features to efficiently model graphs as flows based on generative flow networks, where the policy network is trained with specially designed rewards. Extensive experiments on real datasets show that the proposed approach has good exploration capability and transferability, outperforming various state-of-the-art methods.

Via

Access Paper or Ask Questions

Multi-agent Policy Reciprocity with Theoretical Guarantee

Apr 12, 2023

Haozhi Wang, Yinchuan Li, Qing Wang, Yunfeng Shao, Jianye Hao

Figure 1 for Multi-agent Policy Reciprocity with Theoretical Guarantee

Figure 2 for Multi-agent Policy Reciprocity with Theoretical Guarantee

Figure 3 for Multi-agent Policy Reciprocity with Theoretical Guarantee

Figure 4 for Multi-agent Policy Reciprocity with Theoretical Guarantee

Abstract:Modern multi-agent reinforcement learning (RL) algorithms hold great potential for solving a variety of real-world problems. However, they do not fully exploit cross-agent knowledge to reduce sample complexity and improve performance. Although transfer RL supports knowledge sharing, it is hyperparameter sensitive and complex. To solve this problem, we propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states. We then define an adjacency space for mismatched states and design a plug-and-play module for value iteration, which enables agents to infer more precise returns. To improve the scalability of PR, deep PR is proposed for continuous control tasks. Moreover, theoretical analysis shows that agents can asymptotically reach consensus through individual perceived rewards and converge to an optimal value function, which implies the stability and effectiveness of PR, respectively. Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.

Via

Access Paper or Ask Questions

Traj-MAE: Masked Autoencoders for Trajectory Prediction

Mar 12, 2023

Hao Chen, Jiaze Wang, Kun Shao, Furui Liu, Jianye Hao, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

Figure 1 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 2 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 3 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 4 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Abstract:Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environment. Specifically, our Traj-MAE employs diverse masking strategies to pre-train the trajectory encoder and map encoder, allowing for the capture of social and temporal information among agents while leveraging the effect of environment from multiple granularities. To address the catastrophic forgetting problem that arises when pre-training the network with multiple masking strategies, we introduce a continual pre-training framework, which can help Traj-MAE learn valuable and diverse information from various strategies efficiently. Our experimental results in both multi-agent and single-agent settings demonstrate that Traj-MAE achieves competitive results with state-of-the-art methods and significantly outperforms our baseline model.

Via

Access Paper or Ask Questions

Out-of-distribution Detection with Implicit Outlier Transformation

Mar 09, 2023

Qizhou Wang, Junjie Ye, Feng Liu, Quanyu Dai, Marcus Kalander, Tongliang Liu, Jianye Hao, Bo Han

Abstract:Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection, enhancing detection capability via model fine-tuning with surrogate OOD data. However, surrogate data typically deviate from test OOD data. Thus, the performance of OE, when facing unseen OOD data, can be weakened. To address this issue, we propose a novel OE-based approach that makes the model perform well for unseen OOD situations, even for unseen OOD cases. It leads to a min-max learning scheme -- searching to synthesize OOD data that leads to worst judgments and learning from such OOD data for uniform performance in OOD detection. In our realization, these worst OOD data are synthesized by transforming original surrogate ones. Specifically, the associated transform functions are learned implicitly based on our novel insight that model perturbation leads to data transformation. Our methodology offers an efficient way of synthesizing OOD data, which can further benefit the detection model, besides the surrogate OOD data. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.

Via

Access Paper or Ask Questions

DR-Label: Improving GNN Models for Catalysis Systems by Label Deconstruction and Reconstruction

Mar 06, 2023

Bowen Wang, Chen Liang, Jiaze Wang, Furui Liu, Shaogang Hao, Dong Li, Jianye Hao, Guangyong Chen, Xiaolong Zou, Pheng-Ann Heng

Abstract:Attaining the equilibrium state of a catalyst-adsorbate system is key to fundamentally assessing its effective properties, such as adsorption energy. Machine learning methods with finer supervision strategies have been applied to boost and guide the relaxation process of an atomic system and better predict its properties at the equilibrium state. In this paper, we present a novel graph neural network (GNN) supervision and prediction strategy DR-Label. The method enhances the supervision signal, reduces the multiplicity of solutions in edge representation, and encourages the model to provide node predictions that are graph structural variation robust. DR-Label first Deconstructs finer-grained equilibrium state information to the model by projecting the node-level supervision signal to each edge. Reversely, the model Reconstructs a more robust equilibrium state prediction by transforming edge-level predictions to node-level with a sphere-fitting algorithm. The DR-Label strategy was applied to three radically distinct models, each of which displayed consistent performance enhancements. Based on the DR-Label strategy, we further proposed DRFormer, which achieved a new state-of-the-art performance on the Open Catalyst 2020 (OC20) dataset and the Cu-based single-atom-alloyed CO adsorption (SAA) dataset. We expect that our work will highlight crucial steps for the development of a more accurate model in equilibrium state property prediction of a catalysis system.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

Mar 04, 2023

Wenqian Li, Yinchuan Li, Zhigang Li, Jianye Hao, Yan Pang

Figure 1 for DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

Figure 2 for DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

Figure 3 for DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

Figure 4 for DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

Abstract:Uncovering rationales behind predictions of graph neural networks (GNNs) has received increasing attention over the years. Existing literature mainly focus on selecting a subgraph, through combinatorial optimization, to provide faithful explanations. However, the exponential size of candidate subgraphs limits the applicability of state-of-the-art methods to large-scale GNNs. We enhance on this through a different approach: by proposing a generative structure -- GFlowNets-based GNN Explainer (GFlowExplainer), we turn the optimization problem into a step-by-step generative problem. Our GFlowExplainer aims to learn a policy that generates a distribution of subgraphs for which the probability of a subgraph is proportional to its' reward. The proposed approach eliminates the influence of node sequence and thus does not need any pre-training strategies. We also propose a new cut vertex matrix to efficiently explore parent states for GFlowNets structure, thus making our approach applicable in a large-scale setting. We conduct extensive experiments on both synthetic and real datasets, and both qualitative and quantitative results show the superiority of our GFlowExplainer.

* ICLR 2023

Via

Access Paper or Ask Questions