Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Gong

Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

May 16, 2024

Bike Chen, Chen Gong, Juha Röning

Figure 1 for Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

Figure 2 for Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

Figure 3 for Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

Figure 4 for Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

Abstract:Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patterns of objects. This problem creates difficulty for the models in learning coherent and complete geometric information from the objects. Consequently, the PCS models only achieve inferior performance. Delving deeply into this issue, we find that the use of unreasonable projection approaches and deskewing scans mainly leads to unwanted missing values in the range images. Besides, almost all previous works fail to consider filling in the unexpected missing values in the PCS task. To alleviate this problem, we first propose a new projection method, namely scan unfolding++ (SU++), to avoid massive missing values in the generated range images. Then, we introduce a simple yet effective approach, namely range-dependent $K$-nearest neighbor interpolation ($K$NNI), to further fill in missing values. Finally, we introduce the Filling Missing Values Network (FMVNet) and Fast FMVNet. Extensive experimental results on SemanticKITTI, SemanticPOSS, and nuScenes datasets demonstrate that by employing the proposed SU++ and $K$NNI, existing range image-based PCS models consistently achieve better performance than the baseline models. Besides, both FMVNet and Fast FMVNet achieve state-of-the-art performance in terms of the speed-accuracy trade-off. The proposed methods can be applied to other range image-based tasks and practical applications.

* This paper has been submitted to a journal

Via

Access Paper or Ask Questions

TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Apr 18, 2024

Chen Gong, Kecen Li, Jin Yao, Tianhao Wang

Figure 1 for TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Figure 2 for TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Figure 3 for TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Figure 4 for TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

Abstract:Reinforcement learning (RL) trains an agent from experiences interacting with the environment. In scenarios where online interactions are impractical, offline RL, which trains the agent using pre-collected datasets, has become popular. While this new paradigm presents remarkable effectiveness across various real-world domains, like healthcare and energy management, there is a growing demand to enable agents to rapidly and completely eliminate the influence of specific trajectories from both the training dataset and the trained agents. To meet this problem, this paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents. The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories. Simultaneously, it ensures the agent maintains its original performance level when facing other remaining trajectories. Additionally, we introduce Trajauditor, a simple yet efficient method to evaluate whether Trajdeleter successfully eliminates the specific trajectories of influence from the offline RL agent. Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch. It effectively unlearns an average of 94.8% of the targeted trajectories yet still performs well in actual environment interactions after unlearning. The replication package and agent parameters are available online.

* 22 pages

Via

Access Paper or Ask Questions

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

Feb 06, 2024

Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, Dacheng Tao, Chen Gong

Abstract:Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reasoning power of LLMs, this paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof. Specifically, our methodology comprises two steps. Firstly, we leverage the logical equivalence of contrapositive to augment the data and rules to enhance the comprehensibility of LLMs. Secondly, we design a set of prompt templates to trigger LLMs to conduct IR based on proof by contradiction that is logically equivalent to the original DR process. Our IR method is simple yet effective and can be straightforwardly integrated with existing DR methods to further boost the reasoning abilities of LLMs. The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%, when compared with traditional DR methods. Moreover, the methods combining IR and DR significantly outperform the methods solely using IR or DR, further demonstrating the effectiveness of our strategy.

* 20 pages,13 figures,4 tables

Via

Access Paper or Ask Questions

Controllable Dense Captioner with Multimodal Embedding Bridging

Feb 01, 2024

Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

Abstract:In this paper, we propose a controllable dense captioner (ControlCap), which accommodates user's intention to dense captioning by introducing linguistic guidance. ControlCap is defined as a multimodal embedding bridging architecture, which comprises multimodal embedding generation (MEG) module and bi-directional embedding bridging (BEB) module. While MEG module represents objects/regions by combining embeddings of detailed information with context-aware ones, it also endows ControlCap the adaptability to specialized controls by utilizing them as linguistic guidance. BEB module aligns the linguistic guidance with visual embeddings through borrowing/returning features from/to the visual domain and gathering such features to predict text descriptions. Experiments on Visual Genome and VG-COCO datasets show that ControlCap respectively outperforms the state-of-the-art methods by 1.5% and 3.7% (mAP). Last but not least, with the capability of converting region-category pairs to region-text pairs, ControlCap is able to act as a powerful data engine for dense captioning. Code is available at https://github.com/callsys/ControlCap.

* https://github.com/callsys/ControlCap

Via

Access Paper or Ask Questions

Direct Distillation between Different Domains

Jan 12, 2024

Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

Figure 1 for Direct Distillation between Different Domains

Figure 2 for Direct Distillation between Different Domains

Figure 3 for Direct Distillation between Different Domains

Figure 4 for Direct Distillation between Different Domains

Abstract:Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed ``Direct Distillation between Different Domains" (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches.

Via

Access Paper or Ask Questions

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Jan 07, 2024

Xianjie Zhang, Jiahao Sun, Chen Gong, Kai Wang, Yifei Cao, Hao Chen, Yu Liu

Figure 1 for Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Figure 2 for Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Figure 3 for Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Figure 4 for Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Abstract:The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft). Although on-demand ride pooling services can bring so many benefits, ride pooling services need a well-defined matching strategy to maximize the benefits for all parties (passengers, drivers, aggregation companies and environment), in which the regional dispatching of vehicles has a significant impact on the matching and revenue. Existing algorithms often only consider revenue maximization, which makes it difficult for requests with unusual distribution to get a ride. How to increase revenue while ensuring a reasonable assignment of requests brings a challenge to ride pooling service companies (aggregation companies). In this paper, we propose a framework for vehicle dispatching for ride pooling tasks, which splits the city into discrete dispatching regions and uses the reinforcement learning (RL) algorithm to dispatch vehicles in these regions. We also consider the mutual information (MI) between vehicle and order distribution as the intrinsic reward of the RL algorithm to improve the correlation between their distributions, thus ensuring the possibility of getting a ride for unusually distributed requests. In experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly increase revenue up to an average of 3\% over the existing best on-demand ride pooling method.

* Accepted by AAMAS 2024

Via

Access Paper or Ask Questions

Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Polyp Segmentation

Dec 26, 2023

Yunqi Gu, Tao Zhou, Yizhe Zhang, Yi Zhou, Kelei He, Chen Gong, Huazhu Fu

Figure 1 for Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Polyp Segmentation

Figure 2 for Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Polyp Segmentation

Figure 3 for Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Polyp Segmentation

Figure 4 for Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Polyp Segmentation

Abstract:Automatic polyp segmentation plays a crucial role in the early diagnosis and treatment of colorectal cancer (CRC). However, existing methods heavily rely on fully supervised training, which requires a large amount of labeled data with time-consuming pixel-wise annotations. Moreover, accurately segmenting polyps poses challenges due to variations in shape, size, and location. To address these issues, we propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised polyp Segmentation (DEC-Seg) from colonoscopy images. First, we propose a Cross-level Feature Aggregation (CFA) module that integrates cross-level adjacent layers to enhance the feature representation ability across different resolutions. To address scale variation, we present a scale-enhanced consistency constraint, which ensures consistency in the segmentation maps generated from the same input image at different scales. This constraint helps handle variations in polyp sizes and improves the robustness of the model. Additionally, we design a scale-aware perturbation consistency scheme to enhance the robustness of the mean teacher model. Furthermore, we propose a cross-generative consistency scheme, in which the original and perturbed images can be reconstructed using cross-segmentation maps. This consistency constraint allows us to mine effective feature representations and boost the segmentation performance. To produce more accurate segmentation maps, we propose a Dual-scale Complementary Fusion (DCF) module that integrates features from two scale-specific decoders operating at different scales. Extensive experimental results on five benchmark datasets demonstrate the effectiveness of our DEC-Seg against other state-of-the-art semi-supervised segmentation approaches. The implementation code will be released at https://github.com/taozh2017/DECSeg.

* 10 pages 7 figures

Via

Access Paper or Ask Questions

SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Dec 17, 2023

Xiaoqi An, Lin Zhao, Chen Gong, Nannan Wang, Di Wang, Jian Yang

Figure 1 for SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Figure 2 for SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Figure 3 for SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Figure 4 for SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

Abstract:High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: "Only sparse human keypoint locations are detected for human pose estimation, is it really necessary to describe the whole image in a dense, high-resolution manner?" Based on dynamic transformer models, we propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose). In detail, SHaRPose consists of two stages. At the coarse stage, the relations between image regions and keypoints are dynamically mined while a coarse estimation is generated. Then, a quality predictor is applied to decide whether the coarse estimation results should be refined. At the fine stage, SHaRPose builds sparse high-resolution representations only on the regions related to the keypoints and provides refined high-precision human pose estimations. Extensive experiments demonstrate the outstanding performance of the proposed method. Specifically, compared to the state-of-the-art method ViTPose, our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of $1.4\times$ faster than ViTPose-Base.

* Accepted to AAAI 2024

Via

Access Paper or Ask Questions

SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

Nov 26, 2023

Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong

Figure 1 for SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

Figure 2 for SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

Figure 3 for SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

Figure 4 for SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

Abstract:Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this paper, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also offer a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the flexible extensibility of our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix. The code is available at https://github.com/zuiran/SpliceMix.

* 13 pages, 10 figures

Via

Access Paper or Ask Questions

Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization

Oct 25, 2023

Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang Liu

Abstract:Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. However, in OOD problems, such solutions are suboptimal as the learning task contains severe distribution noises, which can mislead the optimization process. Therefore, apart from finding the task-related parameters (i.e., invariant parameters), we propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift (i.e., variant parameters). Once the variant parameters are left out of invariant learning, a robust subnetwork that is resistant to distribution shift can be found. Additionally, the parameters that are relatively stable across distributions can be considered invariant ones to improve invariant learning. By fully exploring both variant and invariant parameters, our EVIL can effectively identify a robust subnetwork to improve OOD generalization. In extensive experiments on integrated testbed: DomainBed, EVIL can effectively and efficiently enhance many popular methods, such as ERM, IRM, SAM, etc.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions