Alert button
Picture for Yiqun Chen

Yiqun Chen

Alert button

Enhancing Your Trained DETRs with Box Refinement

Jul 21, 2023
Yiqun Chen, Qiang Chen, Peize Sun, Shoufa Chen, Jingdong Wang, Jian Cheng

Figure 1 for Enhancing Your Trained DETRs with Box Refinement
Figure 2 for Enhancing Your Trained DETRs with Box Refinement
Figure 3 for Enhancing Your Trained DETRs with Box Refinement
Figure 4 for Enhancing Your Trained DETRs with Box Refinement

We present a conceptually simple, efficient, and general framework for localization problems in DETR-like models. We add plugins to well-trained models instead of inefficiently designing new models and training them from scratch. The method, called RefineBox, refines the outputs of DETR-like detectors by lightweight refinement networks. RefineBox is easy to implement and train as it only leverages the features and predicted boxes from the well-trained detection models. Our method is also efficient as we freeze the trained detectors during training. In addition, we can easily generalize RefineBox to various trained detection models without any modification. We conduct experiments on COCO and LVIS $1.0$. Experimental results indicate the effectiveness of our RefineBox for DETR and its representative variants (Figure 1). For example, the performance gains for DETR, Conditinal-DETR, DAB-DETR, and DN-DETR are 2.4 AP, 2.5 AP, 1.9 AP, and 1.6 AP, respectively. We hope our work will bring the attention of the detection community to the localization bottleneck of current DETR-like models and highlight the potential of the RefineBox framework. Code and models will be publicly available at: \href{https://github.com/YiqunChen1999/RefineBox}{https://github.com/YiqunChen1999/RefineBox}.

Viaarxiv icon

Transformer in Transformer as Backbone for Deep Reinforcement Learning

Jan 03, 2023
Hangyu Mao, Rui Zhao, Hao Chen, Jianye Hao, Yiqun Chen, Dong Li, Junge Zhang, Zhen Xiao

Figure 1 for Transformer in Transformer as Backbone for Deep Reinforcement Learning
Figure 2 for Transformer in Transformer as Backbone for Deep Reinforcement Learning
Figure 3 for Transformer in Transformer as Backbone for Deep Reinforcement Learning
Figure 4 for Transformer in Transformer as Backbone for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings consistently.

* As far as we know, TIT is the first pure Transformer-based backbone for deep online and offline RL, and it also extends the offline SL paradigm proposed by Decision Transformer 
Viaarxiv icon

DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection

Nov 25, 2022
Yiqun Chen, Qiang Chen, Qinghao Hu, Jian Cheng

Figure 1 for DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection
Figure 2 for DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection
Figure 3 for DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection
Figure 4 for DATE: Dual Assignment for End-to-End Fully Convolutional Object Detection

Fully convolutional detectors discard the one-to-many assignment and adopt a one-to-one assigning strategy to achieve end-to-end detection but suffer from the slow convergence issue. In this paper, we revisit these two assignment methods and find that bringing one-to-many assignment back to end-to-end fully convolutional detectors helps with model convergence. Based on this observation, we propose {\em \textbf{D}ual \textbf{A}ssignment} for end-to-end fully convolutional de\textbf{TE}ction (DATE). Our method constructs two branches with one-to-many and one-to-one assignment during training and speeds up the convergence of the one-to-one assignment branch by providing more supervision signals. DATE only uses the branch with the one-to-one matching strategy for model inference, which doesn't bring inference overhead. Experimental results show that Dual Assignment gives nontrivial improvements and speeds up model convergence upon OneNet and DeFCN. Code: https://github.com/YiqunChen1999/date.

Viaarxiv icon

PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning

Oct 17, 2022
Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, Hongxing Chang

Figure 1 for PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning
Figure 2 for PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning
Figure 3 for PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning
Figure 4 for PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning

Centralized Training with Decentralized Execution (CTDE) has been a very popular paradigm for multi-agent reinforcement learning. One of its main features is making full use of the global information to learn a better joint $Q$-function or centralized critic. In this paper, we in turn explore how to leverage the global information to directly learn a better individual $Q$-function or individual actor. We find that applying the same global information to all agents indiscriminately is not enough for good performance, and thus propose to specify the global information for each agent to obtain agent-specific global information for better performance. Furthermore, we distill such agent-specific global information into the agent's local information, which is used during decentralized execution without too much performance degradation. We call this new paradigm Personalized Training with Distillated Execution (PTDE). PTDE can be easily combined with many state-of-the-art algorithms to further improve their performance, which is verified in both SMAC and Google Research Football scenarios.

Viaarxiv icon

Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Jul 30, 2022
Haolin Pan, Yong Guo, Qinyi Deng, Haomin Yang, Yiqun Chen, Jian Chen

Figure 1 for Improving Fine-tuning of Self-supervised Models with Contrastive Initialization
Figure 2 for Improving Fine-tuning of Self-supervised Models with Contrastive Initialization
Figure 3 for Improving Fine-tuning of Self-supervised Models with Contrastive Initialization
Figure 4 for Improving Fine-tuning of Self-supervised Models with Contrastive Initialization

Self-supervised learning (SSL) has achieved remarkable performance in pretraining the models that can be further used in downstream tasks via fine-tuning. However, these self-supervised models may not capture meaningful semantic information since the images belonging to the same class are always regarded as negative pairs in the contrastive loss. Consequently, the images of the same class are often located far away from each other in learned feature space, which would inevitably hamper the fine-tuning process. To address this issue, we seek to provide a better initialization for the self-supervised models by enhancing the semantic information. To this end, we propose a Contrastive Initialization (COIN) method that breaks the standard fine-tuning pipeline by introducing an extra initialization stage before fine-tuning. Extensive experiments show that, with the enriched semantics, our COIN significantly outperforms existing methods without introducing extra training cost and sets new state-of-the-arts on multiple downstream tasks.

* 22 pages, 4 figures 
Viaarxiv icon