Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaodong Yang

DPPMask: Masked Image Modeling with Determinantal Point Processes

Mar 25, 2023

Junde Xu, Zikai Lin, Donghao Zhou, Yaodong Yang, Xiangyun Liao, Bian Wu, Guangyong Chen, Pheng-Ann Heng

Figure 1 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 2 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 3 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 4 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Abstract:Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images. Despite the empirical success, most previous works have neglected the important fact that it is unreasonable to force the model to reconstruct something beyond recovery, such as those masked objects. In this work, we show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information, resulting in a misalignment problem and hurting the representative learning eventually. To address this issue, we augment MIM with a new masking strategy namely the DPPMask by substituting the random process with Determinantal Point Process (DPPs) to reduce the semantic change of the image after masking. Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks. In particular, we evaluate our method on two representative MIM frameworks, MAE and iBOT. We show that DPPMask surpassed random sampling under both lower and higher masking ratios, indicating that DPPMask makes the reconstruction task more reasonable. We further test our method on the background challenge and multi-class classification tasks, showing that our method is more robust at various tasks.

Via

Access Paper or Ask Questions

ASP: Learn a Universal Neural Solver!

Mar 01, 2023

Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang

Figure 1 for ASP: Learn a Universal Neural Solver!

Figure 2 for ASP: Learn a Universal Neural Solver!

Figure 3 for ASP: Learn a Universal Neural Solver!

Figure 4 for ASP: Learn a Universal Neural Solver!

Abstract:Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy. However, existing learning-based solvers often struggle with generalization when faced with changes in problem distributions and scales. In this paper, we propose a new approach called ASP: Adaptive Staircase Policy Space Response Oracle to address these generalization issues and learn a universal neural solver. ASP consists of two components: Distributional Exploration, which enhances the solver's ability to handle unknown distributions using Policy Space Response Oracles, and Persistent Scale Adaption, which improves scalability through curriculum learning. We have tested ASP on several challenging COPs, including the traveling salesman problem, the vehicle routing problem, and the prize collecting TSP, as well as the real-world instances from TSPLib and CVRPLib. Our results show that even with the same model size and weak training signal, ASP can help neural solvers explore and adapt to unseen distributions and varying scales, achieving superior performance. In particular, compared with the same neural solvers under a standard training pipeline, ASP produces a remarkable decrease in terms of the optimality gap with 90.9% and 47.43% on generated instances and real-world instances for TSP, and a decrease of 19% and 45.57% for CVRP.

Via

Access Paper or Ask Questions

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Dec 02, 2022

Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Figure 1 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 2 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 3 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 4 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Abstract:Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision-making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multiagent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super-hard maps. We further study extensive research problems regarding ACE, including extension, generalization, and practicability. Code is made available to facilitate further research.

* Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023)

Via

Access Paper or Ask Questions

Contextual Transformer for Offline Meta Reinforcement Learning

Nov 15, 2022

Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

Figure 1 for Contextual Transformer for Offline Meta Reinforcement Learning

Figure 2 for Contextual Transformer for Offline Meta Reinforcement Learning

Figure 3 for Contextual Transformer for Offline Meta Reinforcement Learning

Figure 4 for Contextual Transformer for Offline Meta Reinforcement Learning

Abstract:The pretrain-finetuning paradigm in large-scale sequence models has made significant progress in natural language processing and computer vision tasks. However, such a paradigm is still hindered by several challenges in Reinforcement Learning (RL), including the lack of self-supervised pretraining algorithms based on offline data and efficient fine-tuning/prompt-tuning over unseen downstream tasks. In this work, we explore how prompts can improve sequence modeling-based offline reinforcement learning (offline-RL) algorithms. Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation. As such, we can pretrain a model on the offline dataset with self-supervised loss and learn a prompt to guide the policy towards desired actions. Secondly, we extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks. We conduct extensive experiments across three different offline-RL settings: offline single-agent RL on the D4RL dataset, offline Meta-RL on the MuJoCo benchmark, and offline MARL on the SMAC benchmark. Superior results validate the strong performance, and generality of our methods.

* Accepted by Foundation Models for Decision Making Workshop at Neural Information Processing Systems, 2022

Via

Access Paper or Ask Questions

TorchOpt: An Efficient Library for Differentiable Optimization

Nov 13, 2022

Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

Figure 1 for TorchOpt: An Efficient Library for Differentiable Optimization

Figure 2 for TorchOpt: An Efficient Library for Differentiable Optimization

Figure 3 for TorchOpt: An Efficient Library for Differentiable Optimization

Figure 4 for TorchOpt: An Efficient Library for Differentiable Optimization

Abstract:Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU. Existing differentiable optimization libraries, however, cannot support efficient algorithm development and multi-CPU/GPU execution, making the development of differentiable optimization algorithms often cumbersome and expensive. This paper introduces TorchOpt, a PyTorch-based efficient library for differentiable optimization. TorchOpt provides a unified and expressive differentiable optimization programming abstraction. This abstraction allows users to efficiently declare and analyze various differentiable optimization programs with explicit gradients, implicit gradients, and zero-order gradients. TorchOpt further provides a high-performance distributed execution runtime. This runtime can fully parallelize computation-intensive differentiation operations (e.g. tensor tree flattening) on CPUs / GPUs and automatically distribute computation to distributed devices. Experimental results show that TorchOpt achieves $5.2\times$ training time speedup on an 8-GPU server. TorchOpt is available at: https://github.com/metaopt/torchopt/.

* NeurIPS 2022 OPT Workshop

Via

Access Paper or Ask Questions

GenDexGrasp: Generalizable Dexterous Grasping

Oct 03, 2022

Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang

Figure 1 for GenDexGrasp: Generalizable Dexterous Grasping

Figure 2 for GenDexGrasp: Generalizable Dexterous Grasping

Figure 3 for GenDexGrasp: Generalizable Dexterous Grasping

Figure 4 for GenDexGrasp: Generalizable Dexterous Grasping

Abstract:Generating dexterous grasping has been a long-standing and challenging robotic task. Despite recent progress, existing methods primarily suffer from two issues. First, most prior arts focus on a specific type of robot hand, lacking the generalizable capability of handling unseen ones. Second, prior arts oftentimes fail to rapidly generate diverse grasps with a high success rate. To jointly tackle these challenges with a unified solution, we propose GenDexGrasp, a novel hand-agnostic grasping algorithm for generalizable grasping. GenDexGrasp is trained on our proposed large-scale multi-hand grasping dataset MultiDex synthesized with force closure optimization. By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands. Compared with previous methods, GenDexGrasp achieves a three-way trade-off among success rate, inference speed, and diversity. Code is available at https://github.com/tengyu-liu/GenDexGrasp.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

End-to-End Affordance Learning for Robotic Manipulation

Sep 26, 2022

Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang, Hao Dong

Figure 1 for End-to-End Affordance Learning for Robotic Manipulation

Figure 2 for End-to-End Affordance Learning for Robotic Manipulation

Figure 3 for End-to-End Affordance Learning for Robotic Manipulation

Figure 4 for End-to-End Affordance Learning for Robotic Manipulation

Abstract:Learning to manipulate 3D objects in an interactive environment has been a challenging problem in Reinforcement Learning (RL). In particular, it is hard to train a policy that can generalize over objects with different semantic categories, diverse shape geometry and versatile functionality. Recently, the technique of visual affordance has shown great prospects in providing object-centric information priors with effective actionable semantics. As such, an effective policy can be trained to open a door by knowing how to exert force on the handle. However, to learn the affordance, it often requires human-defined action primitives, which limits the range of applicable tasks. In this study, we take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest. Such contact prediction process then leads to an end-to-end affordance learning framework that can generalize over different types of manipulation tasks. Surprisingly, the effectiveness of such framework holds even under the multi-stage and the multi-agent scenarios. We tested our method on eight types of manipulation tasks. Results showed that our methods outperform baseline algorithms, including visual-based affordance methods and RL methods, by a large margin on the success rate. The demonstration can be found at https://sites.google.com/view/rlafford/.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

Constrained Update Projection Approach to Safe Policy Optimization

Sep 15, 2022

Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

Figure 1 for Constrained Update Projection Approach to Safe Policy Optimization

Figure 2 for Constrained Update Projection Approach to Safe Policy Optimization

Figure 3 for Constrained Update Projection Approach to Safe Policy Optimization

Figure 4 for Constrained Update Projection Approach to Safe Policy Optimization

Abstract:Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL/tree/ main/CUP.

* Accepted by NeurIPS2022. arXiv admin note: substantial text overlap with arXiv:2202.07565; text overlap with arXiv:2002.06506 by other authors

Via

Access Paper or Ask Questions

Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

Aug 24, 2022

Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

Figure 1 for Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

Figure 2 for Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

Figure 3 for Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

Figure 4 for Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

Abstract:Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations. However, service providers frequently face more complex obstacles in real-world circumstances, such as deployment cost constraints and fairness requirements. Knowledge distillation, which transfers the knowledge of a well-trained complex model (teacher) to a simple model (student), has been proposed to alleviate the former concern, but the best current distillation methods focus only on how to make the student model imitate the predictions of the teacher model. To better facilitate the application of deep models, we propose a fair information retrieval framework based on knowledge distillation. This framework can improve the exposure-based fairness of models while considerably decreasing model size. Our extensive experiments on three huge datasets show that our proposed framework can reduce the model size to a minimum of 1% of its original size while maintaining its black-box state. It also improves fairness performance by 15%~46% while keeping a high level of recommendation effectiveness.

* This paper has been accepted by the 23rd International Conference on Web Information Systems Engineering (WISE 2022)

Via

Access Paper or Ask Questions

Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

Aug 02, 2022

Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang

Figure 1 for Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

Figure 2 for Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

Figure 3 for Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

Figure 4 for Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL

Abstract:The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community. However, many research endeavors have been focused on developing practical MARL algorithms whose effectiveness has been studied only empirically, thereby lacking theoretical guarantees. As recent studies have revealed, MARL methods often achieve performance that is unstable in terms of reward monotonicity or suboptimal at convergence. To resolve these issues, in this paper, we introduce a novel framework named Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for MARL algorithmic designs. We prove that algorithms derived from the HAML template satisfy the desired properties of the monotonic improvement of the joint reward and the convergence to Nash equilibrium. We verify the practicality of HAML by proving that the current state-of-the-art cooperative MARL algorithms, HATRPO and HAPPO, are in fact HAML instances. Next, as a natural outcome of our theory, we propose HAML extensions of two well-known RL algorithms, HAA2C (for A2C) and HADDPG (for DDPG), and demonstrate their effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo tasks.

Via

Access Paper or Ask Questions