Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xian-Sheng Hua

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Feb 17, 2024

Huafeng Liu, Mengmeng Sheng, Zeren Sun, Yazhou Yao, Xian-Sheng Hua, Heng-Tao Shen

Figure 1 for Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Figure 2 for Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Figure 3 for Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Figure 4 for Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Abstract:Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and discard high-loss ones to alleviate the negative impact of noisy labels. However, real-world datasets contain not only noisy labels but also class imbalance. The imbalance issue is prone to causing failure in the loss-based sample selection since the under-learning of tail classes also leans to produce high losses. To this end, we propose a simple yet effective method to address noisy labels in imbalanced datasets. Specifically, we propose Class-Balance-based sample Selection (CBS) to prevent the tail class samples from being neglected during training. We propose Confidence-based Sample Augmentation (CSA) for the chosen clean samples to enhance their reliability in the training process. To exploit selected noisy samples, we resort to prediction history to rectify labels of noisy samples. Moreover, we introduce the Average Confidence Margin (ACM) metric to measure the quality of corrected labels by leveraging the model's evolving training dynamics, thereby ensuring that low-quality corrected noisy samples are appropriately masked out. Lastly, consistency regularization is imposed on filtered label-corrected noisy samples to boost model performance. Comprehensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method, especially in imbalanced scenarios. Comprehensive experimental results on synthetic and real-world datasets demonstrate the effectiveness and superiority of our proposed method, especially in imbalanced scenarios.

* accepted by IEEE Transactions on Multimedia

Via

Access Paper or Ask Questions

Hierarchical Graph Pattern Understanding for Zero-Shot VOS

Dec 15, 2023

Gensheng Pei, Fumin Shen, Yazhou Yao, Tao Chen, Xian-Sheng Hua, Heng-Tao Shen

Figure 1 for Hierarchical Graph Pattern Understanding for Zero-Shot VOS

Figure 2 for Hierarchical Graph Pattern Understanding for Zero-Shot VOS

Figure 3 for Hierarchical Graph Pattern Understanding for Zero-Shot VOS

Figure 4 for Hierarchical Graph Pattern Understanding for Zero-Shot VOS

Abstract:The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (\ie, optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU}.

* IEEE Transactions on Image Processing 2023
* accepted by IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

Nov 04, 2023

Xuanyi Liu, Zhongqi Yue, Xian-Sheng Hua

Figure 1 for Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

Figure 2 for Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

Figure 3 for Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

Figure 4 for Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

Abstract:Open World Object Detection (OWOD) combines open-set object detection with incremental learning capabilities to handle the challenge of the open and dynamic visual world. Existing works assume that a foreground predictor trained on the seen categories can be directly transferred to identify the unseen categories' locations by selecting the top-k most confident foreground predictions. However, the assumption is hardly valid in practice. This is because the predictor is inevitably biased to the known categories, and fails under the shift in the appearance of the unseen categories. In this work, we aim to build an unbiased foreground predictor by re-formulating the task under Unsupervised Domain Adaptation, where the current biased predictor helps form the domains: the seen object locations and confident background locations as the source domain, and the rest ambiguous ones as the target domain. Then, we adopt the simple and effective self-training method to learn a predictor based on the domain-invariant foreground features, hence achieving unbiased prediction robust to the shift in appearance between the seen and unseen categories. Our approach's pipeline can adapt to various detection frameworks and UDA methods, empirically validated by OWOD evaluation, where we achieve state-of-the-art performance.

Via

Access Paper or Ask Questions

Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Sep 21, 2023

Yusheng Zhao, Xiao Luo, Wei Ju, Chong Chen, Xian-Sheng Hua, Ming Zhang

Figure 1 for Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Figure 2 for Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Figure 3 for Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Figure 4 for Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Abstract:This paper studies the problem of traffic flow forecasting, which aims to predict future traffic conditions on the basis of road networks and traffic conditions in the past. The problem is typically solved by modeling complex spatio-temporal correlations in traffic data using spatio-temporal graph neural networks (GNNs). However, the performance of these methods is still far from satisfactory since GNNs usually have limited representation capacity when it comes to complex traffic networks. Graphs, by nature, fall short in capturing non-pairwise relations. Even worse, existing methods follow the paradigm of message passing that aggregates neighborhood information linearly, which fails to capture complicated spatio-temporal high-order interactions. To tackle these issues, in this paper, we propose a novel model named Dynamic Hypergraph Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise relationships, our DyHSL extracts hypergraph structural information to model dynamics in the traffic networks, and updates each node representation by aggregating messages from its associated hyperedges. Additionally, to capture high-order spatio-temporal relations in the road network, we introduce an interactive graph convolution block, which further models the neighborhood interaction for each node. Finally, we integrate these two views into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns. Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of our proposed DyHSL compared with a broad range of competing baselines.

* Accepted by 2023 IEEE 39th International Conference on Data Engineering (ICDE 2023)

Via

Access Paper or Ask Questions

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Aug 18, 2023

Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

Figure 1 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Figure 2 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Figure 3 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Figure 4 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Abstract:We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model. Surprisingly, such an ensemble, though seems trivial, has hardly been shown effective in recent 2D-3D models. We find out the crux is the less effective training for the ''joint hard samples'', which have high confidence prediction on different wrong labels, implying that the 2D and 3D models do not collaborate well. To this end, our proposed invariant training strategy, called InvJoint, does not only emphasize the training more on the hard samples, but also seeks the invariance between the conflicting 2D and 3D ambiguous predictions. InvJoint can learn more collaborative 2D and 3D representations for better ensemble. Extensive experiments on 3D shape classification with widely adopted ModelNet10/40, ScanObjectNN and Toys4K, and shape retrieval with ShapeNet-Core validate the superiority of our InvJoint.

Via

Access Paper or Ask Questions

Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Jul 28, 2023

Ke Yan, Dakai Jin, Dazhou Guo, Minfeng Xu, Na Shen, Xian-Sheng Hua, Xianghua Ye, Le Lu

Figure 1 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Figure 2 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Figure 3 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Figure 4 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Abstract:Finding abnormal lymph nodes in radiological images is highly important for various medical tasks such as cancer metastasis staging and radiotherapy planning. Lymph nodes (LNs) are small glands scattered throughout the body. They are grouped or defined to various LN stations according to their anatomical locations. The CT imaging appearance and context of LNs in different stations vary significantly, posing challenges for automated detection, especially for pathological LNs. Motivated by this observation, we propose a novel end-to-end framework to improve LN detection performance by leveraging their station information. We design a multi-head detector and make each head focus on differentiating the LN and non-LN structures of certain stations. Pseudo station labels are generated by an LN station classifier as a form of multi-task learning during training, so we do not need another explicit LN station prediction model during inference. Our algorithm is evaluated on 82 patients with lung cancer and 91 patients with esophageal cancer. The proposed implicit station stratification method improves the detection sensitivity of thoracic lymph nodes from 65.1% to 71.4% and from 80.3% to 85.5% at 2 false positives per patient on the two datasets, respectively, which significantly outperforms various existing state-of-the-art baseline techniques such as nnUNet, nnDetection and LENS.

Via

Access Paper or Ask Questions

Random Boxes Are Open-world Object Detectors

Jul 17, 2023

Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang

Figure 1 for Random Boxes Are Open-world Object Detectors

Figure 2 for Random Boxes Are Open-world Object Detectors

Figure 3 for Random Boxes Are Open-world Object Detectors

Figure 4 for Random Boxes Are Open-world Object Detectors

Abstract:We show that classifiers trained with random region proposals achieve state-of-the-art Open-world Object Detection (OWOD): they can not only maintain the accuracy of the known objects (w/ training labels), but also considerably improve the recall of unknown ones (w/o training labels). Specifically, we propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training iteration, surpassing existing Faster R-CNN and Transformer based OWOD. Its effectiveness stems from the following two benefits introduced by randomness. First, as the randomization is independent of the distribution of the limited known objects, the random proposals become the instrumental variable that prevents the training from being confounded by the known objects. Second, the unbiased training encourages more proposal explorations by using our proposed matching score that does not penalize the random proposals whose prediction scores do not match the known objects. On two benchmarks: Pascal-VOC/MS-COCO and LVIS, RandBox significantly outperforms the previous state-of-the-art in all metrics. We also detail the ablations on randomization and loss designs. Codes are available at https://github.com/scuwyh2000/RandBox.

* ICCV 2023

Via

Access Paper or Ask Questions

CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Jun 10, 2023

Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo

Abstract:Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.

Via

Access Paper or Ask Questions

PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

May 24, 2023

Hao Wu, Wei Xiong, Fan Xu, Xiao Luo, Chong Chen, Xian-Sheng Hua, Haixin Wang

Figure 1 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Figure 2 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Figure 3 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Figure 4 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

Abstract:In this paper, we investigate the challenge of spatio-temporal video prediction, which involves generating future videos based on historical data streams. Existing approaches typically utilize external information such as semantic maps to enhance video prediction, which often neglect the inherent physical knowledge embedded within videos. Furthermore, their high computational demands could impede their applications for high-resolution videos. To address these constraints, we introduce a novel approach called Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality video predictions. The core of our PastNet lies in incorporating a spectral convolution operator in the Fourier domain, which efficiently introduces inductive biases from the underlying physical laws. Additionally, we employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex spatio-temporal signals, thereby reducing computational costs and facilitating efficient high-resolution video prediction. Extensive experiments on various widely-used datasets demonstrate the effectiveness and efficiency of the proposed PastNet compared with state-of-the-art methods, particularly in high-resolution scenarios. Our code is available at https://github.com/easylearningscores/PastNet.

* 11

Via

Access Paper or Ask Questions

Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation

May 06, 2023

Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu

Abstract:Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions