Alert button
Picture for Jiaxin Chen

Jiaxin Chen

Alert button

Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

Aug 30, 2023
Yangkun Chen, Joseph Suarez, Junjie Zhang, Chenghui Yu, Bo Wu, Hanmo Chen, Hengman Zhu, Rui Du, Shanliang Qian, Shuai Liu, Weijun Hong, Jinke He, Yibing Zhang, Liang Zhao, Clare Zhu, Julian Togelius, Sharada Mohanty, Jiaxin Chen, Xiu Li, Xiaolong Zhu, Phillip Isola

Figure 1 for Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO
Figure 2 for Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents in the environment. The top submissions demonstrate strong success on this task using mostly standard reinforcement learning (RL) methods combined with domain-specific engineering. We summarize the competition design and results and suggest that, as an academic community, competitions may be a powerful approach to solving hard problems and establishing a solid benchmark for algorithms. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.

Viaarxiv icon

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

Aug 23, 2023
Nan Zhou, Jiaxin Chen, Di Huang

The visual models pretrained on large-scale benchmarks encode general knowledge and prove effective in building more powerful representations for downstream tasks. Most existing approaches follow the fine-tuning paradigm, either by initializing or regularizing the downstream model based on the pretrained one. The former fails to retain the knowledge in the successive fine-tuning phase, thereby prone to be over-fitting, and the latter imposes strong constraints to the weights or feature maps of the downstream model without considering semantic drift, often incurring insufficient optimization. To deal with these issues, we propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune). It employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution, which prevents it from over-fitting while enabling sufficient training of downstream encoders. Furthermore, to alleviate the interference by semantic drift, we develop the semantic calibration (SC) module to align the global shape and class centers of the pretrained and downstream feature distributions. Extensive experiments on widely used image classification datasets show that DR-Tune consistently improves the performance when combing with various backbones under different pretraining strategies. Code is available at: https://github.com/weeknan/DR-Tune.

* Accepted by ICCV'2023 
Viaarxiv icon

CTP-Net: Character Texture Perception Network for Document Image Forgery Localization

Aug 15, 2023
Xin Liao, Siliang Chen, Jiaxin Chen, Tianyi Wang, Xiehua Li

Figure 1 for CTP-Net: Character Texture Perception Network for Document Image Forgery Localization
Figure 2 for CTP-Net: Character Texture Perception Network for Document Image Forgery Localization
Figure 3 for CTP-Net: Character Texture Perception Network for Document Image Forgery Localization
Figure 4 for CTP-Net: Character Texture Perception Network for Document Image Forgery Localization

Due to the progression of information technology in recent years, document images have been widely disseminated on social networks. With the help of powerful image editing tools, document images are easily forged without leaving visible manipulation traces, which leads to severe issues if significant information is falsified for malicious use. Therefore, the research of document image forensics is worth further exploring. In this paper, we propose a Character Texture Perception Network (CTP-Net) to localize the forged regions in document images. Specifically, considering the characters with semantics in a document image are highly vulnerable, capturing the forgery traces is the key to localize the forged regions. We design a Character Texture Stream (CTS) based on optical character recognition to capture features of text areas that are essential components of a document image. Meanwhile, texture features of the whole document image are exploited by an Image Texture Stream (ITS). Combining the features extracted from the CTS and the ITS, the CTP-Net can reveal more subtle forgery traces from document images. Moreover, to overcome the challenge caused by the lack of fake document images, we design a data generation strategy that is utilized to construct a Fake Chinese Trademark dataset (FCTM). Experimental results on different datasets demonstrate that the proposed CTP-Net is able to localize multi-scale forged areas in document images, and outperform the state-of-the-art forgery localization methods, even though post-processing operations are applied.

Viaarxiv icon

Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images

Mar 25, 2023
Bowei Du, Yecheng Huang, Jiaxin Chen, Di Huang

Figure 1 for Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
Figure 2 for Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
Figure 3 for Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
Figure 4 for Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images

Object detection on drone images with low-latency is an important but challenging task on the resource-constrained unmanned aerial vehicle (UAV) platform. This paper investigates optimizing the detection head based on the sparse convolution, which proves effective in balancing the accuracy and efficiency. Nevertheless, it suffers from inadequate integration of contextual information of tiny objects as well as clumsy control of the mask ratio in the presence of foreground with varying scales. To address the issues above, we propose a novel global context-enhanced adaptive sparse convolutional network (CEASC). It first develops a context-enhanced group normalization (CE-GN) layer, by replacing the statistics based on sparsely sampled features with the global contextual ones, and then designs an adaptive multi-layer masking strategy to generate optimal mask ratios at distinct scales for compact foreground coverage, promoting both the accuracy and efficiency. Extensive experimental results on two major benchmarks, i.e. VisDrone and UAVDT, demonstrate that CEASC remarkably reduces the GFLOPs and accelerates the inference procedure when plugging into the typical state-of-the-art detection frameworks (e.g. RetinaNet and GFL V1) with competitive performance. Code is available at https://github.com/Cuogeihong/CEASC.

* Accepted by CVPR 2023 
Viaarxiv icon

OcTr: Octree-based Transformer for 3D Object Detection

Mar 22, 2023
Chao Zhou, Yanan Zhang, Jiaxin Chen, Di Huang

Figure 1 for OcTr: Octree-based Transformer for 3D Object Detection
Figure 2 for OcTr: Octree-based Transformer for 3D Object Detection
Figure 3 for OcTr: Octree-based Transformer for 3D Object Detection
Figure 4 for OcTr: Octree-based Transformer for 3D Object Detection

A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects. Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. It first constructs a dynamic octree on the hierarchical feature pyramid through conducting self-attention on the top level and then recursively propagates to the level below restricted by the octants, which captures rich global context in a coarse-to-fine manner while maintaining the computational complexity under control. Furthermore, for enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask, to fully exploit semantic and geometry clues. Extensive experiments are conducted on the Waymo Open Dataset and KITTI Dataset, and OcTr reaches newly state-of-the-art results.

* Accepted by CVPR 2023 
Viaarxiv icon

Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning

Feb 22, 2023
Guangyuan Shi, Qimai Li, Wenlong Zhang, Jiaxin Chen, Xiao-Ming Wu

Figure 1 for Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning
Figure 2 for Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning
Figure 3 for Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning
Figure 4 for Recon: Reducing Conflicting Gradients from the Root for Multi-Task Learning

A fundamental challenge for multi-task learning is that different tasks may conflict with each other when they are solved jointly, and a cause of this phenomenon is conflicting gradients during optimization. Recent works attempt to mitigate the influence of conflicting gradients by directly altering the gradients based on some criteria. However, our empirical study shows that ``gradient surgery'' cannot effectively reduce the occurrence of conflicting gradients. In this paper, we take a different approach to reduce conflicting gradients from the root. In essence, we investigate the task gradients w.r.t. each shared network layer, select the layers with high conflict scores, and turn them to task-specific layers. Our experiments show that such a simple approach can greatly reduce the occurrence of conflicting gradients in the remaining shared layers and achieve better performance, with only a slight increase in model parameters in many cases. Our approach can be easily applied to improve various state-of-the-art methods including gradient manipulation methods and branched architecture search methods. Given a network architecture (e.g., ResNet18), it only needs to search for the conflict layers once, and the network can be modified to be used with different methods on the same or even different datasets to gain performance improvement. The source code is available at https://github.com/moukamisama/Recon.

* Accepted as a conference paper at ICLR 2023 
Viaarxiv icon

Emergent collective intelligence from massive-agent cooperation and competition

Jan 05, 2023
Hanmo Chen, Stone Tao, Jiaxin Chen, Weihan Shen, Xihui Li, Chenghui Yu, Sikai Cheng, Xiaolong Zhu, Xiu Li

Figure 1 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 2 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 3 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 4 for Emergent collective intelligence from massive-agent cooperation and competition

Inspired by organisms evolving through cooperation and competition between different populations on Earth, we study the emergence of artificial collective intelligence through massive-agent reinforcement learning. To this end, We propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. In Lux, we build our agents through the standard reinforcement learning algorithm in curriculum learning phases and leverage centralized control via a pixel-to-pixel policy network. As agents co-evolve through self-play, we observe several stages of intelligence, from the acquisition of atomic skills to the development of group strategies. Since these learned group strategies arise from individual decisions without an explicit coordination mechanism, we claim that artificial collective intelligence emerges from massive-agent cooperation and competition. We further analyze the emergence of various learned strategies through metrics and ablation studies, aiming to provide insights for reinforcement learning implementations in massive-agent environments.

* Published at NeurIPS 2022 Deep RL workshop. Code available at https://github.com/hanmochen/lux-open 
Viaarxiv icon

Multi-Agent Path Finding via Tree LSTM

Oct 24, 2022
Yuhao Jiang, Kunjie Zhang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

Figure 1 for Multi-Agent Path Finding via Tree LSTM
Figure 2 for Multi-Agent Path Finding via Tree LSTM
Figure 3 for Multi-Agent Path Finding via Tree LSTM
Figure 4 for Multi-Agent Path Finding via Tree LSTM

In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from the fields of both Operations Research (OR) and Reinforcement Learning (RL). However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method. This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before. We creatively apply a novel network architecture, TreeLSTM, to MAPF in our solution. Together with several other RL techniques, including reward shaping, multiple-phase training, and centralized control, our solution is comparable to the top 2-3 OR methods.

* In submission to AAAI23-MAPF 
Viaarxiv icon

SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild

Jul 11, 2022
Jie Qin, Shuaihang Yuan, Jiaxin Chen, Boulbaba Ben Amor, Yi Fang, Nhat Hoang-Xuan, Chi-Bien Chu, Khoi-Nguyen Nguyen-Ngoc, Thien-Tri Cao, Nhat-Khang Ngo, Tuan-Luc Huynh, Hai-Dang Nguyen, Minh-Triet Tran, Haoyang Luo, Jianning Wang, Zheng Zhang, Zihao Xin, Yang Wang, Feng Wang, Ying Tang, Haiqin Chen, Yan Wang, Qunying Zhou, Ji Zhang, Hongyuan Wang

Figure 1 for SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild
Figure 2 for SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild
Figure 3 for SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild
Figure 4 for SHREC'22 Track: Sketch-Based 3D Shape Retrieval in the Wild

Sketch-based 3D shape retrieval (SBSR) is an important yet challenging task, which has drawn more and more attention in recent years. Existing approaches address the problem in a restricted setting, without appropriately simulating real application scenarios. To mimic the realistic setting, in this track, we adopt large-scale sketches drawn by amateurs of different levels of drawing skills, as well as a variety of 3D shapes including not only CAD models but also models scanned from real objects. We define two SBSR tasks and construct two benchmarks consisting of more than 46,000 CAD models, 1,700 realistic models, and 145,000 sketches in total. Four teams participated in this track and submitted 15 runs for the two tasks, evaluated by 7 commonly-adopted metrics. We hope that, the benchmarks, the comparative results, and the open-sourced evaluation code will foster future research in this direction among the 3D object retrieval community.

Viaarxiv icon