Alert button
Picture for Zonghao Guo

Zonghao Guo

Alert button

Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes

Aug 17, 2022
Yongqiang Mao, Zonghao Guo, Xiaonan Lu, Zhiqiang Yuan, Haowen Guo

Figure 1 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 2 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 3 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 4 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes

Few-shot segmentation of point cloud remains a challenging task, as there is no effective way to convert local point cloud information to global representation, which hinders the generalization ability of point features. In this study, we propose a bidirectional feature globalization (BFG) approach, which leverages the similarity measurement between point features and prototype vectors to embed global perception to local point features in a bidirectional fashion. With point-to-prototype globalization (Po2PrG), BFG aggregates local point features to prototypes according to similarity weights from dense point features to sparse prototypes. With prototype-to-point globalization (Pr2PoG), the global perception is embedded to local point features based on similarity weights from sparse prototypes to dense point features. The sparse prototypes of each class embedded with global perception are summarized to a single prototype for few-shot 3D segmentation based on the metric learning framework. Extensive experiments on S3DIS and ScanNet demonstrate that BFG significantly outperforms the state-of-the-art methods.

* Institutional error 
Viaarxiv icon

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

May 19, 2022
Xiaosong Zhang, Feng Liu, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

Figure 1 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 2 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 3 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 4 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models. In this study, we propose to integrally migrate the pre-trained transformer encoder-decoders (imTED) for object detection, constructing a feature extraction-operation path that is not only "fully pre-trained" but also consistent with pre-trained models. The essential improvements of imTED over existing transformer-based detectors are twofold: (1) it embeds the pre-trained transformer decoder to the detector head; and (2) it removes the feature pyramid network from the feature extraction path. Such improvements significantly reduce the proportion of randomly initialized parameters and enhance the generation capability of detectors. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by ~2.8% AP. Without bells and whistles, imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP, demonstrating significantly higher generalization capability. Code will be made publicly available.

* 12 pages,5 figures 
Viaarxiv icon

Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders

Apr 11, 2022
Yongqiang Mao, Xian Sun, Wenhui Diao, Kaiqiang Chen, Zonghao Guo, Xiaonan Lu, Kun Fu

Figure 1 for Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Figure 2 for Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Figure 3 for Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Figure 4 for Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders

Semantic segmentation of point clouds generates comprehensive understanding of scenes through densely predicting the category for each point. Due to the unicity of receptive field, semantic segmentation of point clouds remains challenging for the expression of multi-receptive field features, which brings about the misclassification of instances with similar spatial structures. In this paper, we propose a graph convolutional network DGFA-Net rooted in dilated graph feature aggregation (DGFA), guided by multi-basis aggregation loss (MALoss) calculated through Pyramid Decoders. To configure multi-receptive field features, DGFA which takes the proposed dilated graph convolution (DGConv) as its basic building block, is designed to aggregate multi-scale feature representation by capturing dilated graphs with various receptive regions. By simultaneously considering penalizing the receptive field information with point sets of different resolutions as calculation bases, we introduce Pyramid Decoders driven by MALoss for the diversity of receptive field bases. Combining these two aspects, DGFA-Net significantly improves the segmentation performance of instances with similar spatial structures. Experiments on S3DIS, ShapeNetPart and Toronto-3D show that DGFA-Net outperforms the baseline approach, achieving a new state-of-the-art segmentation performance.

Viaarxiv icon

Long-tailed Distribution Adaptation

Oct 06, 2021
Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye

Figure 1 for Long-tailed Distribution Adaptation
Figure 2 for Long-tailed Distribution Adaptation
Figure 3 for Long-tailed Distribution Adaptation
Figure 4 for Long-tailed Distribution Adaptation

Recognizing images with long-tailed distributions remains a challenging problem while there lacks an interpretable mechanism to solve this problem. In this study, we formulate Long-tailed recognition as Domain Adaption (LDA), by modeling the long-tailed distribution as an unbalanced domain and the general distribution as a balanced domain. Within the balanced domain, we propose to slack the generalization error bound, which is defined upon the empirical risks of unbalanced and balanced domains and the divergence between them. We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way. Experiments on benchmark datasets for image recognition, object detection, and instance segmentation validate that our LDA approach, beyond its interpretability, achieves state-of-the-art performance. Code is available at https://github.com/pengzhiliang/LDA.

* Accepted in acm mm2021 
Viaarxiv icon