Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Qin

An Improved Normed-Deformable Convolution for Crowd Counting

Jun 16, 2022
Xin Zhong, Zhaoyi Yan, Jing Qin, Wangmeng Zuo, Weigang Lu

Figure 1 for An Improved Normed-Deformable Convolution for Crowd Counting

Figure 2 for An Improved Normed-Deformable Convolution for Crowd Counting

Figure 3 for An Improved Normed-Deformable Convolution for Crowd Counting

Figure 4 for An Improved Normed-Deformable Convolution for Crowd Counting

In recent years, crowd counting has become an important issue in computer vision. In most methods, the density maps are generated by convolving with a Gaussian kernel from the ground-truth dot maps which are marked around the center of human heads. Due to the fixed geometric structures in CNNs and indistinct head-scale information, the head features are obtained incompletely. Deformable convolution is proposed to exploit the scale-adaptive capabilities for CNN features in the heads. By learning the coordinate offsets of the sampling points, it is tractable to improve the ability to adjust the receptive field. However, the heads are not uniformly covered by the sampling points in the deformable convolution, resulting in loss of head information. To handle the non-uniformed sampling, an improved Normed-Deformable Convolution (\textit{i.e.,}NDConv) implemented by Normed-Deformable loss (\textit{i.e.,}NDloss) is proposed in this paper. The offsets of the sampling points which are constrained by NDloss tend to be more even. Then, the features in the heads are obtained more completely, leading to better performance. Especially, the proposed NDConv is a light-weight module which shares similar computation burden with Deformable Convolution. In the extensive experiments, our method outperforms state-of-the-art methods on ShanghaiTech A, ShanghaiTech B, UCF\_QNRF, and UCF\_CC\_50 dataset, achieving 61.4, 7.8, 91.2, and 167.2 MAE, respectively. The code is available at https://github.com/bingshuangzhuzi/NDConv

* IEEE Signal Processing Letters 2022

Via

Access Paper or Ask Questions

AGConv: Adaptive Graph Convolution on 3D Point Clouds

Jun 09, 2022
Mingqiang Wei, Zeyong Wei, Haoran Zhou, Fei Hu, Huajian Si, Zhilei Chen, Zhe Zhu, Jingbo Qiu, Xuefeng Yan, Yanwen Guo, Jun Wang, Jing Qin

Figure 1 for AGConv: Adaptive Graph Convolution on 3D Point Clouds

Figure 2 for AGConv: Adaptive Graph Convolution on 3D Point Clouds

Figure 3 for AGConv: Adaptive Graph Convolution on 3D Point Clouds

Figure 4 for AGConv: Adaptive Graph Convolution on 3D Point Clouds

Convolution on 3D point clouds is widely researched yet far from perfect in geometric deep learning. The traditional wisdom of convolution characterises feature correspondences indistinguishably among 3D points, arising an intrinsic limitation of poor distinctive feature learning. In this paper, we propose Adaptive Graph Convolution (AGConv) for wide applications of point cloud analysis. AGConv generates adaptive kernels for points according to their dynamically learned features. Compared with the solution of using fixed/isotropic kernels, AGConv improves the flexibility of point cloud convolutions, effectively and precisely capturing the diverse relations between points from different semantic parts. Unlike the popular attentional weight schemes, AGConv implements the adaptiveness inside the convolution operation instead of simply assigning different weights to the neighboring points. Extensive evaluations clearly show that our method outperforms state-of-the-arts of point cloud classification and segmentation on various benchmark datasets.Meanwhile, AGConv can flexibly serve more point cloud analysis approaches to boost their performance. To validate its flexibility and effectiveness, we explore AGConv-based paradigms of completion, denoising, upsampling, registration and circle extraction, which are comparable or even superior to their competitors. Our code is available at https://github.com/hrzhou2/AdaptConv-master.

* arXiv admin note: substantial text overlap with arXiv:2108.08035

Via

Access Paper or Ask Questions

XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Jun 02, 2022
Jiacheng Wang, Fei Chen, Yuxi Ma, Liansheng Wang, Zhaodong Fei, Jianwei Shuai, Xiangdong Tang, Qichao Zhou, Jing Qin

Figure 1 for XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Figure 2 for XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Figure 3 for XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Figure 4 for XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

Skin lesion segmentation from dermoscopy images is of great significance in the quantitative analysis of skin cancers, which is yet challenging even for dermatologists due to the inherent issues, i.e., considerable size, shape and color variation, and ambiguous boundaries. Recent vision transformers have shown promising performance in handling the variation through global context modeling. Still, they have not thoroughly solved the problem of ambiguous boundaries as they ignore the complementary usage of the boundary knowledge and global contexts. In this paper, we propose a novel cross-scale boundary-aware transformer, \textbf{XBound-Former}, to simultaneously address the variation and boundary problems of skin lesion segmentation. XBound-Former is a purely attention-based network and catches boundary knowledge via three specially designed learners. We evaluate the model on two skin lesion datasets, ISIC-2016\&PH$^2$ and ISIC-2018, where our model consistently outperforms other convolution- and transformer-based models, especially on the boundary-wise metrics. We extensively verify the generalization ability of polyp lesion segmentation that has similar characteristics, and our model can also yield significant improvement compared to the latest models.

* https://github.com/jcwang123/xboundformer

Via

Access Paper or Ask Questions

UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

May 04, 2022
Yongzhen Wang, Xuefeng Yan, Fu Lee Wang, Haoran Xie, Wenhan Yang, Mingqiang Wei, Jing Qin

Figure 1 for UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

Figure 2 for UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

Figure 3 for UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

Figure 4 for UCL-Dehaze: Towards Real-world Image Dehazing via Unsupervised Contrastive Learning

While the wisdom of training an image dehazing model on synthetic hazy data can alleviate the difficulty of collecting real-world hazy/clean image pairs, it brings the well-known domain shift problem. From a different yet new perspective, this paper explores contrastive learning with an adversarial training effort to leverage unpaired real-world hazy and clean images, thus bridging the gap between synthetic and real-world haze is avoided. We propose an effective unsupervised contrastive learning paradigm for image dehazing, dubbed UCL-Dehaze. Unpaired real-world clean and hazy images are easily captured, and will serve as the important positive and negative samples respectively when training our UCL-Dehaze network. To train the network more effectively, we formulate a new self-contrastive perceptual loss function, which encourages the restored images to approach the positive samples and keep away from the negative samples in the embedding space. Besides the overall network architecture of UCL-Dehaze, adversarial training is utilized to align the distributions between the positive samples and the dehazed images. Compared with recent image dehazing works, UCL-Dehaze does not require paired data during training and utilizes unpaired positive/negative data to better enhance the dehazing performance. We conduct comprehensive experiments to evaluate our UCL-Dehaze and demonstrate its superiority over the state-of-the-arts, even only 1,800 unpaired real-world images are used to train our network. Source code has been available at https://github.com/yz-wang/UCL-Dehaze.

* 14 pages, 9 figures, 9 tables

Via

Access Paper or Ask Questions

Robust Dual-Graph Regularized Moving Object Detection

Apr 25, 2022
Jing Qin, Ruilong Shen, Ruihan Zhu, Biyun Xie

Figure 1 for Robust Dual-Graph Regularized Moving Object Detection

Figure 2 for Robust Dual-Graph Regularized Moving Object Detection

Figure 3 for Robust Dual-Graph Regularized Moving Object Detection

Figure 4 for Robust Dual-Graph Regularized Moving Object Detection

Moving object detection and its associated background-foreground separation have been widely used in a lot of applications, including computer vision, transportation and surveillance. Due to the presence of the static background, a video can be naturally decomposed into a low-rank background and a sparse foreground. Many regularization techniques, such as matrix nuclear norm, have been imposed on the background. In the meanwhile, sparsity or smoothness based regularizations, such as total variation and $\ell_1$, can be imposed on the foreground. Moreover, graph Laplacians are further imposed to capture the complicated geometry of background images. Recently, weighted regularization techniques including the weighted nuclear norm regularization have been proposed in the image processing community to promote adaptive sparsity while achieving efficient performance. In this paper, we propose a robust dual-graph regularized moving object detection model based on the weighted nuclear norm regularization, which is solved by the alternating direction method of multipliers (ADMM). Numerical experiments on body movement data sets have demonstrated the effectiveness of this method in separating moving objects from background, and the great potential in robotic applications.

Via

Access Paper or Ask Questions

Semi-DRDNet Semi-supervised Detail-recovery Image Deraining Network via Unpaired Contrastive Learning

Apr 06, 2022
Yiyang Shen, Sen Deng, Wenhan Yang, Mingqiang Wei, Haoran Xie, XiaoPing Zhang, Jing Qin, Meng Wang

Figure 1 for Semi-DRDNet Semi-supervised Detail-recovery Image Deraining Network via Unpaired Contrastive Learning

Figure 2 for Semi-DRDNet Semi-supervised Detail-recovery Image Deraining Network via Unpaired Contrastive Learning

Figure 3 for Semi-DRDNet Semi-supervised Detail-recovery Image Deraining Network via Unpaired Contrastive Learning

Figure 4 for Semi-DRDNet Semi-supervised Detail-recovery Image Deraining Network via Unpaired Contrastive Learning

The intricacy of rainy image contents often leads cutting-edge deraining models to image degradation including remnant rain, wrongly-removed details, and distorted appearance. Such degradation is further exacerbated when applying the models trained on synthetic data to real-world rainy images. We raise an intriguing question -- if leveraging both accessible unpaired clean/rainy yet real-world images and additional detail repair guidance, can improve the generalization ability of a deraining model? To answer it, we propose a semi-supervised detail-recovery image deraining network (termed as Semi-DRDNet). Semi-DRDNet consists of three branches: 1) for removing rain streaks without remnants, we present a \textit{squeeze-and-excitation} (SE)-based rain residual network; 2) for encouraging the lost details to return, we construct a \textit{structure detail context aggregation} (SDCAB)-based detail repair network; to our knowledge, this is the first time; and 3) for bridging the domain gap, we develop a novel contrastive regularization network to learn from unpaired positive (clean) and negative (rainy) yet real-world images. As a semi-supervised learning paradigm, Semi-DRDNet operates smoothly on both synthetic and real-world rainy data in terms of deraining robustness and detail accuracy. Comparisons on four datasets show clear visual and numerical improvements of our Semi-DRDNet over thirteen state-of-the-arts.

* 17 pages

Via

Access Paper or Ask Questions

Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Mar 26, 2022
Guangyuan Li, Jun Lv, Yapeng Tian, Qi Dou, Chengyan Wang, Chenliang Xu, Jing Qin

Figure 1 for Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Figure 2 for Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Figure 3 for Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Figure 4 for Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution

Magnetic resonance imaging (MRI) can present multi-contrast images of the same anatomical structures, enabling multi-contrast super-resolution (SR) techniques. Compared with SR reconstruction using a single-contrast, multi-contrast SR reconstruction is promising to yield SR images with higher quality by leveraging diverse yet complementary information embedded in different imaging modalities. However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures. We propose a novel network to comprehensively address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques; we call it McMRSR. Firstly, we tame transformers to model long-range dependencies in both reference and target images. Then, a new multi-scale contextual matching method is proposed to capture corresponding contexts from reference features at different scales. Furthermore, we introduce a multi-scale aggregation mechanism to gradually and interactively aggregate multi-scale matched features for reconstructing the target SR MR image. Extensive experiments demonstrate that our network outperforms state-of-the-art approaches and has great potential to be applied in clinical practice. Codes are available at https://github.com/XAIMI-Lab/McMRSR.

* CVPR 2022 accepted

Via

Access Paper or Ask Questions

Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Mar 23, 2022
Haoran Zhou, Honghua Chen, Yingkui Zhang, Mingqiang Wei, Haoran Xie, Jun Wang, Tong Lu, Jing Qin, Xiao-Ping Zhang

Figure 1 for Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Figure 2 for Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Figure 3 for Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Figure 4 for Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Point normal, as an intrinsic geometric property of 3D objects, not only serves conventional geometric tasks such as surface consolidation and reconstruction, but also facilitates cutting-edge learning-based techniques for shape analysis and generation. In this paper, we propose a normal refinement network, called Refine-Net, to predict accurate normals for noisy point clouds. Traditional normal estimation wisdom heavily depends on priors such as surface shapes or noise distributions, while learning-based solutions settle for single types of hand-crafted features. Differently, our network is designed to refine the initial normal of each point by extracting additional information from multiple feature representations. To this end, several feature modules are developed and incorporated into Refine-Net by a novel connection module. Besides the overall network architecture of Refine-Net, we propose a new multi-scale fitting patch selection scheme for the initial normal estimation, by absorbing geometry domain knowledge. Also, Refine-Net is a generic normal estimation framework: 1) point normals obtained from other methods can be further refined, and 2) any feature module related to the surface geometric structures can be potentially integrated into the framework. Qualitative and quantitative evaluations demonstrate the clear superiority of Refine-Net over the state-of-the-arts on both synthetic and real-scanned datasets. Our code is available at https://github.com/hrzhou2/refinenet.

* Accepted by TPAMI

Via

Access Paper or Ask Questions

Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Dec 06, 2021
Jiacheng Wang, Xiaomeng Li, Yiming Han, Jing Qin, Liansheng Wang, Qichao Zhou

Figure 1 for Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Figure 2 for Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Figure 3 for Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Figure 4 for Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data. Recent studies in this direction design various contrastive losses on the feature maps, to yield discriminative features for each pixel in the map. However, pixels in the same map inevitably share semantics to be closer than they actually are, which may affect the discrimination of pixels in the same map and lead to the unfair comparison to pixels in other maps. To address these issues, we propose a separated region-level contrastive learning scheme, namely SepaReg, the core of which is to separate each image into regions and encode each region separately. Specifically, SepaReg comprises two components: a structure-aware image separation (SIS) module and an intra- and inter-organ distillation (IID) module. The SIS is proposed to operate on the image set to rebuild a region set under the guidance of structural information. The inter-organ representation will be learned from this set via typical contrastive losses cross regions. On the other hand, the IID is proposed to tackle the quantity imbalance in the region set as tiny organs may produce fewer regions, by exploiting intra-organ representations. We conducted extensive experiments to evaluate the proposed model on a public dataset and two private datasets. The experimental results demonstrate the effectiveness of the proposed model, consistently achieving better performance than state-of-the-art approaches. Code is available at https://github.com/jcwang123/Separate_CL.

* Accepted in AAAI-22

Via

Access Paper or Ask Questions

Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

Nov 08, 2021
Jiacheng Wang, Yueming Jin, Shuntian Cai, Hongzhi Xu, Pheng-Ann Heng, Jing Qin, Liansheng Wang

Figure 1 for Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

Figure 2 for Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

Figure 3 for Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

Figure 4 for Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network

We propose a novel shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection (ESD) surgery. This task is of great clinical significance but extremely challenging due to bleeding, lighting reflection, and motion blur in the complicated surgical environment. Compared with existing solutions, which either neglect geometric relationships among targeting objects or capture the relationships by using complicated aggregation schemes, the proposed network is capable of achieving satisfactory accuracy while maintaining real-time performance by taking full advantage of the spatial relations among landmarks. We first devise an algorithm to automatically generate relation keypoint heatmaps, which are able to intuitively represent the prior knowledge of spatial relations among landmarks without using any extra manual annotation efforts. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process. While one scheme introduces pixel-level regularization by multi-task learning, the other integrates global-level regularization by harnessing a newly designed grouped consistency evaluator, which adds relation constraints to the proposed network in an adversarial manner. Both schemes are beneficial to the model in training, and can be readily unloaded in inference to achieve real-time detection. We establish a large in-house dataset of ESD surgery for esophageal cancer to validate the effectiveness of our proposed method. Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy and efficiency, achieving better detection results faster. Promising results on two downstream applications further corroborate the great potential of our method in ESD clinical practice.

Via

Access Paper or Ask Questions