Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Baosheng Yu

GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Jul 06, 2022

Haibo Qiu, Baosheng Yu, Dacheng Tao

Figure 1 for GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Figure 2 for GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Figure 3 for GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Figure 4 for GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation

Abstract:Point cloud semantic segmentation from projected views, such as range-view (RV) and bird's-eye-view (BEV), has been intensively investigated. Different views capture different information of point clouds and thus are complementary to each other. However, recent projection-based methods for point cloud semantic segmentation usually utilize a vanilla late fusion strategy for the predictions of different views, failing to explore the complementary information from a geometric perspective during the representation learning. In this paper, we introduce a geometric flow network (GFNet) to explore the geometric correspondence between different views in an align-before-fuse manner. Specifically, we devise a novel geometric flow module (GFM) to bidirectionally align and propagate the complementary information across different views according to geometric relationships under the end-to-end learning scheme. We perform extensive experiments on two widely used benchmark datasets, SemanticKITTI and nuScenes, to demonstrate the effectiveness of our GFNet for project-based point cloud semantic segmentation. Concretely, GFNet not only significantly boosts the performance of each individual view but also achieves state-of-the-art results over all existing projection-based models. Code is available at \url{https://github.com/haibo-qiu/GFNet}.

* Code is available at \url{https://github.com/haibo-qiu/GFNet}

Via

Access Paper or Ask Questions

BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

Apr 04, 2022

Zhi Hou, Baosheng Yu, Chaoyue Wang, Yibing Zhan, Dacheng Tao

Figure 1 for BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

Figure 2 for BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

Figure 3 for BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

Figure 4 for BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning

Abstract:Attention mechanisms have been very popular in deep neural networks, where the Transformer architecture has achieved great success in not only natural language processing but also visual recognition applications. Recently, a new Transformer module, applying on batch dimension rather than spatial/channel dimension, i.e., BatchFormer [18], has been introduced to explore sample relationships for overcoming data scarcity challenges. However, it only works with image-level representations for classification. In this paper, we devise a more general batch Transformer module, BatchFormerV2, which further enables exploring sample relationships for dense representation learning. Specifically, when applying the proposed module, it employs a two-stream pipeline during training, i.e., either with or without a BatchFormerV2 module, where the batchformer stream can be removed for testing. Therefore, the proposed method is a plug-and-play module and can be easily integrated into different vision Transformers without any extra inference cost. Without bells and whistles, we show the effectiveness of the proposed method for a variety of popular visual recognition tasks, including image classification and two important dense prediction tasks: object detection and panoptic segmentation. Particularly, BatchFormerV2 consistently improves current DETR-based detection methods (e.g., DETR, Deformable-DETR, Conditional DETR, and SMCA) by over 1.3%. Code will be made publicly available.

* Tech report

Via

Access Paper or Ask Questions

BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Mar 29, 2022

Zhi Hou, Baosheng Yu, Dacheng Tao

Figure 1 for BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Figure 2 for BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Figure 3 for BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Figure 4 for BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning

Abstract:Despite the success of deep neural networks, there are still many challenges in deep representation learning due to the data scarcity issues such as data imbalance, unseen distribution, and domain shift. To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. Specifically, we introduce a batch transformer module or BatchFormer, which is then applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. By doing this, the proposed method enables the collaboration of different samples, e.g., the head-class samples can also contribute to the learning of the tail classes for long-tailed recognition. Furthermore, to mitigate the gap between training and testing, we share the classifier between with or without the BatchFormer during training, which can thus be removed during testing. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning. Code will be made publicly available at https://github.com/zhihou7/BatchFormer.

* Camera Ready, CVPR2022

Via

Access Paper or Ask Questions

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

Mar 27, 2022

Zhi Hou, Baosheng Yu, Dacheng Tao

Figure 1 for Discovering Human-Object Interaction Concepts via Self-Compositional Learning

Figure 2 for Discovering Human-Object Interaction Concepts via Self-Compositional Learning

Figure 3 for Discovering Human-Object Interaction Concepts via Self-Compositional Learning

Figure 4 for Discovering Human-Object Interaction Concepts via Self-Compositional Learning

Abstract:A comprehensive understanding of human-object interaction (HOI) requires detecting not only a small portion of predefined HOI concepts (or categories) but also other reasonable HOI concepts, while current approaches usually fail to explore a huge portion of unknown HOI concepts (i.e., unknown but reasonable combinations of verbs and objects). In this paper, 1) we introduce a novel and challenging task for a comprehensive HOI understanding, which is termed as HOI Concept Discovery; and 2) we devise a self-compositional learning framework (or SCL) for HOI concept discovery. Specifically, we maintain an online updated concept confidence matrix during training: 1) we assign pseudo-labels for all composite HOI instances according to the concept confidence matrix for self-training; and 2) we update the concept confidence matrix using the predictions of all composite HOI instances. Therefore, the proposed method enables the learning on both known and unknown HOI concepts. We perform extensive experiments on several popular HOI datasets to demonstrate the effectiveness of the proposed method for HOI concept discovery, object affordance recognition and HOI detection. For example, the proposed self-compositional learning framework significantly improves the performance of 1) HOI concept discovery by over 10% on HICO-DET and over 3% on V-COCO, respectively; 2) object affordance recognition by over 9% mAP on MS-COCO and HICO-DET; and 3) rare-first and non-rare-first unknown HOI detection relatively over 30% and 20%, respectively. Code and models will be made publicly available at https://github.com/zhihou7/HOI-CL.

* Technical Report

Via

Access Paper or Ask Questions

Exploring High-Order Structure for Robust Graph Structure Learning

Mar 22, 2022

Guangqian Yang, Yibing Zhan, Jinlong Li, Baosheng Yu, Liu Liu, Fengxiang He

Figure 1 for Exploring High-Order Structure for Robust Graph Structure Learning

Figure 2 for Exploring High-Order Structure for Robust Graph Structure Learning

Figure 3 for Exploring High-Order Structure for Robust Graph Structure Learning

Figure 4 for Exploring High-Order Structure for Robust Graph Structure Learning

Abstract:Recent studies show that Graph Neural Networks (GNNs) are vulnerable to adversarial attack, i.e., an imperceptible structure perturbation can fool GNNs to make wrong predictions. Some researches explore specific properties of clean graphs such as the feature smoothness to defense the attack, but the analysis of it has not been well-studied. In this paper, we analyze the adversarial attack on graphs from the perspective of feature smoothness which further contributes to an efficient new adversarial defensive algorithm for GNNs. We discover that the effect of the high-order graph structure is a smoother filter for processing graph structures. Intuitively, the high-order graph structure denotes the path number between nodes, where larger number indicates closer connection, so it naturally contributes to defense the adversarial perturbation. Further, we propose a novel algorithm that incorporates the high-order structural information into the graph structure learning. We perform experiments on three popular benchmark datasets, Cora, Citeseer and Polblogs. Extensive experiments demonstrate the effectiveness of our method for defending against graph adversarial attacks.

Via

Access Paper or Ask Questions

Contrastive Boundary Learning for Point Cloud Segmentation

Mar 11, 2022

Liyao Tang, Yibing Zhan, Zhe Chen, Baosheng Yu, Dacheng Tao

Figure 1 for Contrastive Boundary Learning for Point Cloud Segmentation

Figure 2 for Contrastive Boundary Learning for Point Cloud Segmentation

Figure 3 for Contrastive Boundary Learning for Point Cloud Segmentation

Figure 4 for Contrastive Boundary Learning for Point Cloud Segmentation

Abstract:Point cloud segmentation is fundamental in understanding 3D environments. However, current 3D point cloud segmentation methods usually perform poorly on scene boundaries, which degenerates the overall segmentation performance. In this paper, we focus on the segmentation of scene boundaries. Accordingly, we first explore metrics to evaluate the segmentation performance on scene boundaries. To address the unsatisfactory performance on boundaries, we then propose a novel contrastive boundary learning (CBL) framework for point cloud segmentation. Specifically, the proposed CBL enhances feature discrimination between points across boundaries by contrasting their representations with the assistance of scene contexts at multiple scales. By applying CBL on three different baseline methods, we experimentally show that CBL consistently improves different baselines and assists them to achieve compelling performance on boundaries, as well as the overall performance, eg in mIoU. The experimental results demonstrate the effectiveness of our method and the importance of boundaries for 3D point cloud segmentation. Code and model will be made publicly available at https://github.com/LiyaoTang/contrastBoundary.

* Preprint; To appear in CVPR2022

Via

Access Paper or Ask Questions

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Mar 05, 2022

Lixiang Ru, Yibing Zhan, Baosheng Yu, Bo Du

Figure 1 for Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Figure 2 for Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Figure 3 for Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Figure 4 for Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Abstract:Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS have received increasing attention from the community. However, current methods are mainly based on convolutional neural networks and fail to explore the global information properly, thus usually resulting in incomplete object regions. In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS. Motivated by the inherent consistency between the self-attention in Transformers and the semantic affinity, we propose an Affinity from Attention (AFA) module to learn semantic affinity from the multi-head self-attention (MHSA) in Transformers. The learned affinity is then leveraged to refine the initial pseudo labels for segmentation. In addition, to efficiently derive reliable affinity labels for supervising AFA and ensure the local consistency of pseudo labels, we devise a Pixel-Adaptive Refinement module that incorporates low-level image appearance information to refine the pseudo labels. We perform extensive experiments and our method achieves 66.0% and 38.9% mIoU on the PASCAL VOC 2012 and MS COCO 2014 datasets, respectively, significantly outperforming recent end-to-end methods and several multi-stage competitors. Code is available at https://github.com/rulixiang/afa.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation

Jan 18, 2022

Chao Chen, Yibing Zhan, Baosheng Yu, Liu Liu, Yong Luo, Bo Du

Figure 1 for Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation

Figure 2 for Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation

Figure 3 for Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation

Figure 4 for Resistance Training using Prior Bias: toward Unbiased Scene Graph Generation

Abstract:Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models' detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all state-of-the-art methods with a large margin.

* Accepted by AAAI 2022

Via

Access Paper or Ask Questions

SkipNode: On Alleviating Over-smoothing for Deep Graph Convolutional Networks

Dec 22, 2021

Weigang Lu, Yibing Zhan, Ziyu Guan, Liu Liu, Baosheng Yu, Wei Zhao, Yaming Yang, Dacheng Tao

Figure 1 for SkipNode: On Alleviating Over-smoothing for Deep Graph Convolutional Networks

Figure 2 for SkipNode: On Alleviating Over-smoothing for Deep Graph Convolutional Networks

Figure 3 for SkipNode: On Alleviating Over-smoothing for Deep Graph Convolutional Networks

Figure 4 for SkipNode: On Alleviating Over-smoothing for Deep Graph Convolutional Networks

Abstract:Over-smoothing is a challenging problem, which degrades the performance of deep graph convolutional networks (GCNs). However, existing studies for alleviating the over-smoothing problem lack either generality or effectiveness. In this paper, we analyze the underlying issues behind the over-smoothing problem, i.e., feature-diversity degeneration, gradient vanishing, and model weights over-decaying. Inspired by this, we propose a simple yet effective plug-and-play module, SkipNode, to alleviate over-smoothing. Specifically, for each middle layer of a GCN model, SkipNode randomly (or based on node degree) selects nodes to skip the convolutional operation by directly feeding their input features to the nonlinear function. Analytically, 1) skipping the convolutional operation prevents the features from losing diversity; and 2) the "skipped" nodes enable gradients to be directly passed back, thus mitigating the gradient vanishing and model weights over-decaying issues. To demonstrate the superiority of SkipNode, we conduct extensive experiments on nine popular datasets, including both homophilic and heterophilic graphs, with different graph sizes on two typical tasks: node classification and link prediction. Specifically, 1) SkipNode has strong generalizability of being applied to various GCN-based models on different datasets and tasks; and 2) SkipNode outperforms recent state-of-the-art anti-over-smoothing plug-and-play modules, i.e., DropEdge and DropNode, in different settings. Code will be made publicly available on GitHub.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

SynFace: Face Recognition with Synthetic Data

Aug 18, 2021

Haibo Qiu, Baosheng Yu, Dihong Gong, Zhifeng Li, Wei Liu, Dacheng Tao

Figure 1 for SynFace: Face Recognition with Synthetic Data

Figure 2 for SynFace: Face Recognition with Synthetic Data

Figure 3 for SynFace: Face Recognition with Synthetic Data

Figure 4 for SynFace: Face Recognition with Synthetic Data

Abstract:With the recent success of deep neural networks, remarkable progress has been achieved on face recognition. However, collecting large-scale real-world training data for face recognition has turned out to be challenging, especially due to the label noise and privacy issues. Meanwhile, existing face recognition datasets are usually collected from web images, lacking detailed annotations on attributes (e.g., pose and expression), so the influences of different attributes on face recognition have been poorly investigated. In this paper, we address the above-mentioned issues in face recognition using synthetic face images, i.e., SynFace. Specifically, we first explore the performance gap between recent state-of-the-art face recognition models trained with synthetic and real face images. We then analyze the underlying causes behind the performance gap, e.g., the poor intra-class variations and the domain gap between synthetic and real face images. Inspired by this, we devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the above performance gap, demonstrating the great potentials of synthetic data for face recognition. Furthermore, with the controllable face synthesis model, we can easily manage different factors of synthetic face generation, including pose, expression, illumination, the number of identities, and samples per identity. Therefore, we also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.

* Accepted by ICCV 2021

Via

Access Paper or Ask Questions