Alert button
Picture for Hui Tang

Hui Tang

Alert button

Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation

Aug 31, 2023
Yuanbin Chen, Tao Wang, Hui Tang, Longxuan Zhao, Ruige Zong, Tao Tan, Xinlin Zhang, Tong Tong

Figure 1 for Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation
Figure 2 for Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation
Figure 3 for Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation
Figure 4 for Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for Semi-Supervised Medical Image Segmentation

Medical image segmentation methods often rely on fully supervised approaches to achieve excellent performance, which is contingent upon having an extensive set of labeled images for training. However, annotating medical images is both expensive and time-consuming. Semi-supervised learning offers a solution by leveraging numerous unlabeled images alongside a limited set of annotated ones. In this paper, we introduce a semi-supervised medical image segmentation method based on the mean-teacher model, referred to as Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation (DCPA). This method combines consistency regularization, pseudo-labels, and data augmentation to enhance the efficacy of semi-supervised segmentation. Firstly, the proposed model comprises both student and teacher models with a shared encoder and two distinct decoders employing different up-sampling strategies. Minimizing the output discrepancy between decoders enforces the generation of consistent representations, serving as regularization during student model training. Secondly, we introduce mixup operations to blend unlabeled data with labeled data, creating mixed data and thereby achieving data augmentation. Lastly, pseudo-labels are generated by the teacher model and utilized as labels for mixed data to compute unsupervised loss. We compare the segmentation results of the DCPA model with six state-of-the-art semi-supervised methods on three publicly available medical datasets. Beyond classical 10\% and 20\% semi-supervised settings, we investigate performance with less supervision (5\% labeled data). Experimental outcomes demonstrate that our approach consistently outperforms existing semi-supervised medical image segmentation methods across the three semi-supervised settings.

Viaarxiv icon

Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective

May 17, 2023
Yunzhe Zhang, Yao Lu, Lei Xu, Kunlin Yang, Hui Tang, Shuyuan Ye, Qi Xuan

Figure 1 for Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective
Figure 2 for Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective
Figure 3 for Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective
Figure 4 for Exploring Inductive Biases in Contrastive Learning: A Clustering Perspective

This paper investigates the differences in data organization between contrastive and supervised learning methods, focusing on the concept of locally dense clusters. We introduce a novel metric, Relative Local Density (RLD), to quantitatively measure local density within clusters. Visual examples are provided to highlight the distinctions between locally dense clusters and globally dense ones. By comparing the clusters formed by contrastive and supervised learning, we reveal that contrastive learning generates locally dense clusters without global density, while supervised learning creates clusters with both local and global density. We further explore the use of a Graph Convolutional Network (GCN) classifier as an alternative to linear classifiers for handling locally dense clusters. Finally, we utilize t-SNE visualizations to substantiate the differences between the features generated by contrastive and supervised learning methods. We conclude by proposing future research directions, including the development of efficient classifiers tailored to contrastive learning and the creation of innovative augmentation algorithms.

Viaarxiv icon

A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation

Mar 23, 2023
Hui Tang, Kui Jia

Figure 1 for A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
Figure 2 for A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
Figure 3 for A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
Figure 4 for A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation

Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of interest, due to high labor costs and unguaranteed labeling accuracy. Besides, the uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist. All these nuisances may hinder the verification of typical theories and exposure to new findings. To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization. We in this work push forward along this line by doing profound and extensive research on bare supervised learning and downstream domain adaptation. Specifically, under the well-controlled, IID data setting enabled by 3D rendering, we systematically verify the typical, important learning insights, e.g., shortcut learning, and discover the new laws of various data regimes and network architectures in generalization. We further investigate the effect of image formation factors on generalization, e.g., object scale, material texture, illumination, camera viewpoint, and background in a 3D scene. Moreover, we use the simulation-to-reality adaptation as a downstream task for comparing the transferability between synthetic and real data when used for pre-training, which demonstrates that synthetic data pre-training is also promising to improve real test results. Lastly, to promote future research, we develop a new large-scale synthetic-to-real benchmark for image classification, termed S2RDA, which provides more significant challenges for transfer from simulation to reality. The code and datasets are available at https://github.com/huitangtang/On_the_Utility_of_Synthetic_Data.

* 24 pages, 14 figures, 5 tables, accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. The proposed new synthetic-to-real benchmark S2RDA is available at https://pan.baidu.com/s/1fHHaqrEHbUZLXEg9XKpgSg?pwd=w9wa 
Viaarxiv icon

SR-init: An interpretable layer pruning method

Mar 17, 2023
Hui Tang, Yao Lu, Qi Xuan

Figure 1 for SR-init: An interpretable layer pruning method
Figure 2 for SR-init: An interpretable layer pruning method
Figure 3 for SR-init: An interpretable layer pruning method
Figure 4 for SR-init: An interpretable layer pruning method

Despite the popularization of deep neural networks (DNNs) in many fields, it is still challenging to deploy state-of-the-art models to resource-constrained devices due to high computational overhead. Model pruning provides a feasible solution to the aforementioned challenges. However, the interpretation of existing pruning criteria is always overlooked. To counter this issue, we propose a novel layer pruning method by exploring the Stochastic Re-initialization. Our SR-init method is inspired by the discovery that the accuracy drop due to stochastic re-initialization of layer parameters differs in various layers. On the basis of this observation, we come up with a layer pruning criterion, i.e., those layers that are not sensitive to stochastic re-initialization (low accuracy drop) produce less contribution to the model and could be pruned with acceptable loss. Afterward, we experimentally verify the interpretability of SR-init via feature visualization. The visual explanation demonstrates that SR-init is theoretically feasible, thus we compare it with state-of-the-art methods to further evaluate its practicability. As for ResNet56 on CIFAR-10 and CIFAR-100, SR-init achieves a great reduction in parameters (63.98% and 37.71%) with an ignorable drop in top-1 accuracy (-0.56% and 0.8%). With ResNet50 on ImageNet, we achieve a 15.59% FLOPs reduction by removing 39.29% of the parameters, with only a drop of 0.6% in top-1 accuracy. Our code is available at https://github.com/huitang-zjut/SR-init.

Viaarxiv icon

Sr-init: An interpretable layer pruning method

Mar 14, 2023
Hui Tang, Yao Lu, Qi Xuan

Figure 1 for Sr-init: An interpretable layer pruning method
Figure 2 for Sr-init: An interpretable layer pruning method
Figure 3 for Sr-init: An interpretable layer pruning method
Figure 4 for Sr-init: An interpretable layer pruning method

Despite the popularization of deep neural networks (DNNs) in many fields, it is still challenging to deploy state-of-the-art models to resource-constrained devices due to high computational overhead. Model pruning provides a feasible solution to the aforementioned challenges. However, the interpretation of existing pruning criteria is always overlooked. To counter this issue, we propose a novel layer pruning method by exploring the Stochastic Re-initialization. Our SR-init method is inspired by the discovery that the accuracy drop due to stochastic re-initialization of layer parameters differs in various layers. On the basis of this observation, we come up with a layer pruning criterion, i.e., those layers that are not sensitive to stochastic re-initialization (low accuracy drop) produce less contribution to the model and could be pruned with acceptable loss. Afterward, we experimentally verify the interpretability of SR-init via feature visualization. The visual explanation demonstrates that SR-init is theoretically feasible, thus we compare it with state-of-the-art methods to further evaluate its practicability. As for ResNet56 on CIFAR-10 and CIFAR-100, SR-init achieves a great reduction in parameters (63.98% and 37.71%) with an ignorable drop in top-1 accuracy (-0.56% and 0.8%). With ResNet50 on ImageNet, we achieve a 15.59% FLOPs reduction by removing 39.29% of the parameters, with only a drop of 0.6% in top-1 accuracy. Our code is available at https://github.com/huitang-zjut/SRinit.

Viaarxiv icon

Unsupervised Domain Adaptation via Distilled Discriminative Clustering

Feb 23, 2023
Hui Tang, Yaowei Wang, Kui Jia

Figure 1 for Unsupervised Domain Adaptation via Distilled Discriminative Clustering
Figure 2 for Unsupervised Domain Adaptation via Distilled Discriminative Clustering
Figure 3 for Unsupervised Domain Adaptation via Distilled Discriminative Clustering
Figure 4 for Unsupervised Domain Adaptation via Distilled Discriminative Clustering

Unsupervised domain adaptation addresses the problem of classifying data in an unlabeled target domain, given labeled source domain data that share a common label space but follow a different distribution. Most of the recent methods take the approach of explicitly aligning feature distributions between the two domains. Differently, motivated by the fundamental assumption for domain adaptability, we re-cast the domain adaptation problem as discriminative clustering of target data, given strong privileged information provided by the closely related, labeled source data. Technically, we use clustering objectives based on a robust variant of entropy minimization that adaptively filters target data, a soft Fisher-like criterion, and additionally the cluster ordering via centroid classification. To distill discriminative source information for target clustering, we propose to jointly train the network using parallel, supervised learning objectives over labeled source data. We term our method of distilled discriminative clustering for domain adaptation as DisClusterDA. We also give geometric intuition that illustrates how constituent objectives of DisClusterDA help learn class-wisely pure, compact feature distributions. We conduct careful ablation studies and extensive experiments on five popular benchmark datasets, including a multi-source domain adaptation one. Based on commonly used backbone networks, DisClusterDA outperforms existing methods on these benchmarks. It is also interesting to observe that in our DisClusterDA framework, adding an additional loss term that explicitly learns to align class-level feature distributions across domains does harm to the adaptation performance, though more careful studies in different algorithmic frameworks are to be conducted.

* A journal paper published by Pattern Recognition in 2022 
Viaarxiv icon

Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition

Mar 10, 2022
Hui Tang, Li Chai, Wanli Lu

Figure 1 for Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition
Figure 2 for Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition
Figure 3 for Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition
Figure 4 for Transferring Dual Stochastic Graph Convolutional Network for Facial Micro-expression Recognition

Micro-expression recognition has drawn increasing attention due to its wide application in lie detection, criminal detection and psychological consultation. To improve the recognition performance of the small micro-expression data, this paper presents a transferring dual stochastic Graph Convolutional Network (TDSGCN) model. We propose a stochastic graph construction method and dual graph convolutional network to extract more discriminative features from the micro-expression images. We use transfer learning to pre-train SGCNs from macro expression data. Optical flow algorithm is also integrated to extract their temporal features. We fuse both spatial and temporal features to improve the recognition performance. To the best of our knowledge, this is the first attempt to utilize the transferring learning and graph convolutional network in micro-expression recognition task. In addition, to handle the class imbalance problem of dataset, we focus on the design of focal loss function. Through extensive evaluation, our proposed method achieves state-of-the-art performance on SAMM and recently released MMEW benchmarks. Our code will be publicly available accompanying this paper.

Viaarxiv icon

MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution

Oct 28, 2021
Chenyu You, Lianyi Han, Aosong Feng, Ruihan Zhao, Hui Tang, Wei Fan

Figure 1 for MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution
Figure 2 for MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution
Figure 3 for MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution
Figure 4 for MEGAN: Memory Enhanced Graph Attention Network for Space-Time Video Super-Resolution

Space-time video super-resolution (STVSR) aims to construct a high space-time resolution video sequence from the corresponding low-frame-rate, low-resolution video sequence. Inspired by the recent success to consider spatial-temporal information for space-time super-resolution, our main goal in this work is to take full considerations of spatial and temporal correlations within the video sequences of fast dynamic events. To this end, we propose a novel one-stage memory enhanced graph attention network (MEGAN) for space-time video super-resolution. Specifically, we build a novel long-range memory graph aggregation (LMGA) module to dynamically capture correlations along the channel dimensions of the feature maps and adaptively aggregate channel features to enhance the feature representations. We introduce a non-local residual block, which enables each channel-wise feature to attend global spatial hierarchical features. In addition, we adopt a progressive fusion module to further enhance the representation ability by extensively exploiting spatial-temporal correlations from multiple frames. Experiment results demonstrate that our method achieves better results compared with the state-of-the-art methods quantitatively and visually.

Viaarxiv icon

Geometry-Aware Self-Training for Unsupervised Domain Adaptationon Object Point Clouds

Aug 20, 2021
Longkun Zou, Hui Tang, Ke Chen, Kui Jia

Figure 1 for Geometry-Aware Self-Training for Unsupervised Domain Adaptationon Object Point Clouds
Figure 2 for Geometry-Aware Self-Training for Unsupervised Domain Adaptationon Object Point Clouds
Figure 3 for Geometry-Aware Self-Training for Unsupervised Domain Adaptationon Object Point Clouds
Figure 4 for Geometry-Aware Self-Training for Unsupervised Domain Adaptationon Object Point Clouds

The point cloud representation of an object can have a large geometric variation in view of inconsistent data acquisition procedure, which thus leads to domain discrepancy due to diverse and uncontrollable shape representation cross datasets. To improve discrimination on unseen distribution of point-based geometries in a practical and feasible perspective, this paper proposes a new method of geometry-aware self-training (GAST) for unsupervised domain adaptation of object point cloud classification. Specifically, this paper aims to learn a domain-shared representation of semantic categories, via two novel self-supervised geometric learning tasks as feature regularization. On one hand, the representation learning is empowered by a linear mixup of point cloud samples with their self-generated rotation labels, to capture a global topological configuration of local geometries. On the other hand, a diverse point distribution across datasets can be normalized with a novel curvature-aware distortion localization. Experiments on the PointDA-10 dataset show that our GAST method can significantly outperform the state-of-the-art methods.

Viaarxiv icon