Alert button
Picture for Xian-Sheng Hua

Xian-Sheng Hua

Alert button

Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

Sep 21, 2023
Yusheng Zhao, Xiao Luo, Wei Ju, Chong Chen, Xian-Sheng Hua, Ming Zhang

This paper studies the problem of traffic flow forecasting, which aims to predict future traffic conditions on the basis of road networks and traffic conditions in the past. The problem is typically solved by modeling complex spatio-temporal correlations in traffic data using spatio-temporal graph neural networks (GNNs). However, the performance of these methods is still far from satisfactory since GNNs usually have limited representation capacity when it comes to complex traffic networks. Graphs, by nature, fall short in capturing non-pairwise relations. Even worse, existing methods follow the paradigm of message passing that aggregates neighborhood information linearly, which fails to capture complicated spatio-temporal high-order interactions. To tackle these issues, in this paper, we propose a novel model named Dynamic Hypergraph Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise relationships, our DyHSL extracts hypergraph structural information to model dynamics in the traffic networks, and updates each node representation by aggregating messages from its associated hyperedges. Additionally, to capture high-order spatio-temporal relations in the road network, we introduce an interactive graph convolution block, which further models the neighborhood interaction for each node. Finally, we integrate these two views into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns. Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of our proposed DyHSL compared with a broad range of competing baselines.

* Accepted by 2023 IEEE 39th International Conference on Data Engineering (ICDE 2023) 
Viaarxiv icon

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

Aug 18, 2023
Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

Figure 1 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
Figure 2 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
Figure 3 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
Figure 4 for Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model. Surprisingly, such an ensemble, though seems trivial, has hardly been shown effective in recent 2D-3D models. We find out the crux is the less effective training for the ''joint hard samples'', which have high confidence prediction on different wrong labels, implying that the 2D and 3D models do not collaborate well. To this end, our proposed invariant training strategy, called InvJoint, does not only emphasize the training more on the hard samples, but also seeks the invariance between the conflicting 2D and 3D ambiguous predictions. InvJoint can learn more collaborative 2D and 3D representations for better ensemble. Extensive experiments on 3D shape classification with widely adopted ModelNet10/40, ScanObjectNN and Toys4K, and shape retrieval with ShapeNet-Core validate the superiority of our InvJoint.

Viaarxiv icon

Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Jul 28, 2023
Ke Yan, Dakai Jin, Dazhou Guo, Minfeng Xu, Na Shen, Xian-Sheng Hua, Xianghua Ye, Le Lu

Figure 1 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification
Figure 2 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification
Figure 3 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification
Figure 4 for Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

Finding abnormal lymph nodes in radiological images is highly important for various medical tasks such as cancer metastasis staging and radiotherapy planning. Lymph nodes (LNs) are small glands scattered throughout the body. They are grouped or defined to various LN stations according to their anatomical locations. The CT imaging appearance and context of LNs in different stations vary significantly, posing challenges for automated detection, especially for pathological LNs. Motivated by this observation, we propose a novel end-to-end framework to improve LN detection performance by leveraging their station information. We design a multi-head detector and make each head focus on differentiating the LN and non-LN structures of certain stations. Pseudo station labels are generated by an LN station classifier as a form of multi-task learning during training, so we do not need another explicit LN station prediction model during inference. Our algorithm is evaluated on 82 patients with lung cancer and 91 patients with esophageal cancer. The proposed implicit station stratification method improves the detection sensitivity of thoracic lymph nodes from 65.1% to 71.4% and from 80.3% to 85.5% at 2 false positives per patient on the two datasets, respectively, which significantly outperforms various existing state-of-the-art baseline techniques such as nnUNet, nnDetection and LENS.

Viaarxiv icon

Random Boxes Are Open-world Object Detectors

Jul 17, 2023
Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang

Figure 1 for Random Boxes Are Open-world Object Detectors
Figure 2 for Random Boxes Are Open-world Object Detectors
Figure 3 for Random Boxes Are Open-world Object Detectors
Figure 4 for Random Boxes Are Open-world Object Detectors

We show that classifiers trained with random region proposals achieve state-of-the-art Open-world Object Detection (OWOD): they can not only maintain the accuracy of the known objects (w/ training labels), but also considerably improve the recall of unknown ones (w/o training labels). Specifically, we propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training iteration, surpassing existing Faster R-CNN and Transformer based OWOD. Its effectiveness stems from the following two benefits introduced by randomness. First, as the randomization is independent of the distribution of the limited known objects, the random proposals become the instrumental variable that prevents the training from being confounded by the known objects. Second, the unbiased training encourages more proposal explorations by using our proposed matching score that does not penalize the random proposals whose prediction scores do not match the known objects. On two benchmarks: Pascal-VOC/MS-COCO and LVIS, RandBox significantly outperforms the previous state-of-the-art in all metrics. We also detail the ablations on randomization and loss designs. Codes are available at https://github.com/scuwyh2000/RandBox.

* ICCV 2023 
Viaarxiv icon

CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Jun 10, 2023
Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo

Figure 1 for CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
Figure 2 for CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
Figure 3 for CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification
Figure 4 for CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.

Viaarxiv icon

PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

May 24, 2023
Hao Wu, Wei Xiong, Fan Xu, Xiao Luo, Chong Chen, Xian-Sheng Hua, Haixin Wang

Figure 1 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
Figure 2 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
Figure 3 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
Figure 4 for PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction

In this paper, we investigate the challenge of spatio-temporal video prediction, which involves generating future videos based on historical data streams. Existing approaches typically utilize external information such as semantic maps to enhance video prediction, which often neglect the inherent physical knowledge embedded within videos. Furthermore, their high computational demands could impede their applications for high-resolution videos. To address these constraints, we introduce a novel approach called Physics-assisted Spatio-temporal Network (PastNet) for generating high-quality video predictions. The core of our PastNet lies in incorporating a spectral convolution operator in the Fourier domain, which efficiently introduces inductive biases from the underlying physical laws. Additionally, we employ a memory bank with the estimated intrinsic dimensionality to discretize local features during the processing of complex spatio-temporal signals, thereby reducing computational costs and facilitating efficient high-resolution video prediction. Extensive experiments on various widely-used datasets demonstrate the effectiveness and efficiency of the proposed PastNet compared with state-of-the-art methods, particularly in high-resolution scenarios. Our code is available at https://github.com/easylearningscores/PastNet.

* 11 
Viaarxiv icon

Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation

May 06, 2023
Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu

Figure 1 for Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Figure 2 for Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Figure 3 for Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Figure 4 for Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation

Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.

* Accepted to CVPR 2022 
Viaarxiv icon

TGNN: A Joint Semi-supervised Framework for Graph-level Classification

Apr 23, 2023
Wei Ju, Xiao Luo, Meng Qu, Yifan Wang, Chong Chen, Minghua Deng, Xian-Sheng Hua, Ming Zhang

Figure 1 for TGNN: A Joint Semi-supervised Framework for Graph-level Classification
Figure 2 for TGNN: A Joint Semi-supervised Framework for Graph-level Classification
Figure 3 for TGNN: A Joint Semi-supervised Framework for Graph-level Classification
Figure 4 for TGNN: A Joint Semi-supervised Framework for Graph-level Classification

This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are far from satisfactory due to their insufficient topology exploration of unlabeled data. We address the challenge by proposing a novel semi-supervised framework called Twin Graph Neural Network (TGNN). To explore graph structural information from complementary views, our TGNN has a message passing module and a graph kernel module. To fully utilize unlabeled data, for each module, we calculate the similarity of each unlabeled graph to other labeled graphs in the memory bank and our consistency loss encourages consistency between two similarity distributions in different embedding spaces. The two twin modules collaborate with each other by exchanging instance similarity knowledge to fully explore the structure information of both labeled and unlabeled data. We evaluate our TGNN on various public datasets and show that it achieves strong performance.

* Accepted by Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI 2022) 
Viaarxiv icon

FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network

Jan 19, 2023
Huafeng Liu, Pai Peng, Tao Chen, Qiong Wang, Yazhou Yao, Xian-Sheng Hua

Figure 1 for FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network
Figure 2 for FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network
Figure 3 for FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network
Figure 4 for FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network

Few-shot semantic segmentation is the task of learning to locate each pixel of the novel class in the query image with only a few annotated support images. The current correlation-based methods construct pair-wise feature correlations to establish the many-to-many matching because the typical prototype-based approaches cannot learn fine-grained correspondence relations. However, the existing methods still suffer from the noise contained in naive correlations and the lack of context semantic information in correlations. To alleviate these problems mentioned above, we propose a Feature-Enhanced Context-Aware Network (FECANet). Specifically, a feature enhancement module is proposed to suppress the matching noise caused by inter-class local similarity and enhance the intra-class relevance in the naive correlation. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features, significantly boosting the encoder to capture a reliable matching pattern. Experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed FECANet leads to remarkable improvement compared to previous state-of-the-arts, demonstrating its effectiveness.

* accepted by IEEE Transactions on Multimedia 
Viaarxiv icon

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Aug 06, 2022
Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

Figure 1 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
Figure 2 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
Figure 3 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
Figure 4 for Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

Out-Of-Distribution generalization (OOD) is all about learning invariance against environmental changes. If the context in every class is evenly distributed, OOD would be trivial because the context can be easily removed due to an underlying principle: class is invariant to context. However, collecting such a balanced dataset is impractical. Learning on imbalanced data makes the model bias to context and thus hurts OOD. Therefore, the key to OOD is context balance. We argue that the widely adopted assumption in prior work, the context bias can be directly annotated or estimated from biased class prediction, renders the context incomplete or even incorrect. In contrast, we point out the everoverlooked other side of the above principle: context is also invariant to class, which motivates us to consider the classes (which are already labeled) as the varying environments to resolve context bias (without context labels). We implement this idea by minimizing the contrastive loss of intra-class sample similarity while assuring this similarity to be invariant across all classes. On benchmarks with various context biases and domain gaps, we show that a simple re-weighting based classifier equipped with our context estimation achieves state-of-the-art performance. We provide the theoretical justifications in Appendix and codes on https://github.com/simpleshinobu/IRMCon.

Viaarxiv icon