Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kan Ren

Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Jul 02, 2023
Ruiwen Zhou, Minghuan Liu, Kan Ren, Xufang Luo, Weinan Zhang, Dongsheng Li

Figure 1 for Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Figure 2 for Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Figure 3 for Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Figure 4 for Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains unclear if the distributional Bellman operator properly optimizes the RSRL objective in the sense of risk measures. In this paper, we prove that the existing RSRL methods do not achieve unbiased optimization and can not guarantee optimality or even improvements regarding risk measures over accumulated return distributions. To remedy this issue, we further propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy. Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies. In the experiments, we verify the learnability of our algorithm and show how our method effectively achieves better performances toward risk-sensitive objectives.

Via

Access Paper or Ask Questions

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

Apr 28, 2023
Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang

Figure 1 for MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

Figure 2 for MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

Figure 3 for MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

Figure 4 for MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

The field of machine learning (ML) has gained widespread adoption, leading to a significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework MLCopilot, which leverages the state-of-the-art LLMs to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks. And we find that, after some dedicated design, the LLM can (i) observe from the existing experiences of ML tasks and (ii) reason effectively to deliver promising results for new tasks. The solution generated can be used directly to achieve high levels of competitiveness.

Via

Access Paper or Ask Questions

AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Apr 20, 2023
Yang Liu, Shen Yan, Yuge Zhang, Kan Ren, Quanlu Zhang, Zebin Ren, Deng Cai, Mi Zhang

Figure 1 for AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Figure 2 for AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Figure 3 for AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Figure 4 for AutoTaskFormer: Searching Vision Transformers for Multi-task Learning

Vision Transformers have shown great performance in single tasks such as classification and segmentation. However, real-world problems are not isolated, which calls for vision transformers that can perform multiple tasks concurrently. Existing multi-task vision transformers are handcrafted and heavily rely on human expertise. In this work, we propose a novel one-shot neural architecture search framework, dubbed AutoTaskFormer (Automated Multi-Task Vision TransFormer), to automate this process. AutoTaskFormer not only identifies the weights to share across multiple tasks automatically, but also provides thousands of well-trained vision transformers with a wide range of parameters (e.g., number of heads and network depth) for deployment under various resource constraints. Experiments on both small-scale (2-task Cityscapes and 3-task NYUv2) and large-scale (16-task Taskonomy) datasets show that AutoTaskFormer outperforms state-of-the-art handcrafted vision transformers in multi-task learning. The entire code and models will be open-sourced.

* 15 pages

Via

Access Paper or Ask Questions

Towards Inference Efficient Deep Ensemble Learning

Jan 29, 2023
Ziyue Li, Kan Ren, Yifan Yang, Xinyang Jiang, Yuqing Yang, Dongsheng Li

Figure 1 for Towards Inference Efficient Deep Ensemble Learning

Figure 2 for Towards Inference Efficient Deep Ensemble Learning

Figure 3 for Towards Inference Efficient Deep Ensemble Learning

Figure 4 for Towards Inference Efficient Deep Ensemble Learning

Ensemble methods can deliver surprising performance gains but also bring significantly higher computational costs, e.g., can be up to 2048X in large-scale ensemble tasks. However, we found that the majority of computations in ensemble methods are redundant. For instance, over 77% of samples in CIFAR-100 dataset can be correctly classified with only a single ResNet-18 model, which indicates that only around 23% of the samples need an ensemble of extra models. To this end, we propose an inference efficient ensemble learning method, to simultaneously optimize for effectiveness and efficiency in ensemble learning. More specifically, we regard ensemble of models as a sequential inference process and learn the optimal halting event for inference on a specific sample. At each timestep of the inference process, a common selector judges if the current ensemble has reached ensemble effectiveness and halt further inference, otherwise filters this challenging sample for the subsequent models to conduct more powerful ensemble. Both the base models and common selector are jointly optimized to dynamically adjust ensemble inference for different samples with various hardness, through the novel optimization goals including sequential ensemble boosting and computation saving. The experiments with different backbones on real-world datasets illustrate our method can bring up to 56\% inference cost reduction while maintaining comparable performance to full ensemble, achieving significantly better ensemble utility than other baselines. Code and supplemental materials are available at https://seqml.github.io/irene.

* 11 pages, accepted in AAAI 2023

Via

Access Paper or Ask Questions

Reinforcement Learning with Automated Auxiliary Loss Search

Oct 12, 2022
Tairan He, Yuge Zhang, Kan Ren, Minghuan Liu, Che Wang, Weinan Zhang, Yuqing Yang, Dongsheng Li

Figure 1 for Reinforcement Learning with Automated Auxiliary Loss Search

Figure 2 for Reinforcement Learning with Automated Auxiliary Loss Search

Figure 3 for Reinforcement Learning with Automated Auxiliary Loss Search

Figure 4 for Reinforcement Learning with Automated Auxiliary Loss Search

A good state representation is crucial to solving complicated reinforcement learning (RL) challenges. Many recent works focus on designing auxiliary losses for learning informative representations. Unfortunately, these handcrafted objectives rely heavily on expert knowledge and may be sub-optimal. In this paper, we propose a principled and universal method for learning better representations with auxiliary loss functions, named Automated Auxiliary Loss Search (A2LS), which automatically searches for top-performing auxiliary loss functions for RL. Specifically, based on the collected trajectory data, we define a general auxiliary loss space of size $7.5 \times 10^{20}$ and explore the space with an efficient evolutionary search strategy. Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and low-dimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.

* NeurIPS 2022 accepted paper

Via

Access Paper or Ask Questions

Bootstrapped Transformer for Offline Reinforcement Learning

Jun 17, 2022
Kerong Wang, Hanye Zhao, Xufang Luo, Kan Ren, Weinan Zhang, Dongsheng Li

Figure 1 for Bootstrapped Transformer for Offline Reinforcement Learning

Figure 2 for Bootstrapped Transformer for Offline Reinforcement Learning

Figure 3 for Bootstrapped Transformer for Offline Reinforcement Learning

Figure 4 for Bootstrapped Transformer for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation models yet has not drawn enough attention in the previous works. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes are available at https://seqml.github.io/bootorl.

* A complete manuscript under review

Via

Access Paper or Ask Questions

Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

May 19, 2022
Zhengyu Yang, Kan Ren, Xufang Luo, Minghuan Liu, Weiqing Liu, Jiang Bian, Weinan Zhang, Dongsheng Li

Figure 1 for Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

Figure 2 for Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

Figure 3 for Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

Figure 4 for Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble

It is challenging for reinforcement learning (RL) algorithms to succeed in real-world applications like financial trading and logistic system due to the noisy observation and environment shifting between training and evaluation. Thus, it requires both high sample efficiency and generalization for resolving real-world tasks. However, directly applying typical RL algorithms can lead to poor performance in such scenarios. Considering the great performance of ensemble methods on both accuracy and generalization in supervised learning (SL), we design a robust and applicable method named Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner. Notably, EPPO combines each policy and the policy ensemble organically and optimizes both simultaneously. In addition, EPPO adopts a diversity enhancement regularization over the policy space which helps to generalize to unseen states and promotes exploration. We theoretically prove EPPO increases exploration efficacy, and through comprehensive experimental evaluations on various tasks, we demonstrate that EPPO achieves higher efficiency and is robust for real-world applications compared with vanilla policy optimization algorithms and other ensemble methods. Code and supplemental materials are available at https://seqml.github.io/eppo.

* Accepted in IJCAI 2022. The codes are available at https://seqml.github.io/eppo

Via

Access Paper or Ask Questions

Domain Generalization using Pretrained Models without Fine-tuning

Mar 09, 2022
Ziyue Li, Kan Ren, Xinyang Jiang, Bo Li, Haipeng Zhang, Dongsheng Li

Figure 1 for Domain Generalization using Pretrained Models without Fine-tuning

Figure 2 for Domain Generalization using Pretrained Models without Fine-tuning

Figure 3 for Domain Generalization using Pretrained Models without Fine-tuning

Figure 4 for Domain Generalization using Pretrained Models without Fine-tuning

Fine-tuning pretrained models is a common practice in domain generalization (DG) tasks. However, fine-tuning is usually computationally expensive due to the ever-growing size of pretrained models. More importantly, it may cause over-fitting on source domain and compromise their generalization ability as shown in recent works. Generally, pretrained models possess some level of generalization ability and can achieve decent performance regarding specific domains and samples. However, the generalization performance of pretrained models could vary significantly over different test domains even samples, which raises challenges for us to best leverage pretrained models in DG tasks. In this paper, we propose a novel domain generalization paradigm to better leverage various pretrained models, named specialized ensemble learning for domain generalization (SEDGE). It first trains a linear label space adapter upon fixed pretrained models, which transforms the outputs of the pretrained model to the label space of the target domain. Then, an ensemble network aware of model specialty is proposed to dynamically dispatch proper pretrained models to predict each test sample. Experimental studies on several benchmarks show that SEDGE achieves significant performance improvements comparing to strong baselines including state-of-the-art method in DG tasks and reduces the trainable parameters by ~99% and the training time by ~99.5%.

Via

Access Paper or Ask Questions

Towards Generating Real-World Time Series Data

Nov 16, 2021
Hengzhi Pei, Kan Ren, Yuqing Yang, Chang Liu, Tao Qin, Dongsheng Li

Figure 1 for Towards Generating Real-World Time Series Data

Figure 2 for Towards Generating Real-World Time Series Data

Figure 3 for Towards Generating Real-World Time Series Data

Figure 4 for Towards Generating Real-World Time Series Data

Time series data generation has drawn increasing attention in recent years. Several generative adversarial network (GAN) based methods have been proposed to tackle the problem usually with the assumption that the targeted time series data are well-formatted and complete. However, real-world time series (RTS) data are far away from this utopia, e.g., long sequences with variable lengths and informative missing data raise intractable challenges for designing powerful generation algorithms. In this paper, we propose a novel generative framework for RTS data - RTSGAN to tackle the aforementioned challenges. RTSGAN first learns an encoder-decoder module which provides a mapping between a time series instance and a fixed-dimension latent vector and then learns a generation module to generate vectors in the same latent space. By combining the generator and the decoder, RTSGAN is able to generate RTS which respect the original feature distributions and the temporal dynamics. To generate time series with missing values, we further equip RTSGAN with an observation embedding layer and a decide-and-generate decoder to better utilize the informative missing patterns. Experiments on the four RTS datasets show that the proposed framework outperforms the previous generation methods in terms of synthetic data utility for downstream classification and prediction tasks.

* Accepted in 21th IEEE International Conference on Data Mining (ICDM 2021). Code is available at https://seqml.github.io/rtsgan

Via

Access Paper or Ask Questions

Infrared target tracking based on proximal robust principal component analysis method

Oct 11, 2020
Chao Ma, Guohua Gu, Xin Miao, Minjie Wan, Weixian Qian, Kan Ren, Qian Chen

Figure 1 for Infrared target tracking based on proximal robust principal component analysis method

Figure 2 for Infrared target tracking based on proximal robust principal component analysis method

Figure 3 for Infrared target tracking based on proximal robust principal component analysis method

Figure 4 for Infrared target tracking based on proximal robust principal component analysis method

Infrared target tracking plays an important role in both civil and military fields. The main challenges in designing a robust and high-precision tracker for infrared sequences include overlap, occlusion and appearance change. To this end, this paper proposes an infrared target tracker based on proximal robust principal component analysis method. Firstly, the observation matrix is decomposed into a sparse occlusion matrix and a low-rank target matrix, and the constraint optimization is carried out with an approaching proximal norm which is better than L1-norm. To solve this convex optimization problem, Alternating Direction Method of Multipliers (ADMM) is employed to estimate the variables alternately. Finally, the framework of particle filter with model update strategy is exploited to locate the target. Through a series of experiments on real infrared target sequences, the effectiveness and robustness of our algorithm are proved.

Via

Access Paper or Ask Questions