Alert button
Picture for Peng Tang

Peng Tang

Alert button

SR-R$^2$KAC: Improving Single Image Defocus Deblurring

Jul 30, 2023
Peng Tang, Zhiqiang Xu, Pengfei Wei, Xiaobin Hu, Peilin Zhao, Xin Cao, Chunlai Zhou, Tobias Lasser

Figure 1 for SR-R$^2$KAC: Improving Single Image Defocus Deblurring
Figure 2 for SR-R$^2$KAC: Improving Single Image Defocus Deblurring
Figure 3 for SR-R$^2$KAC: Improving Single Image Defocus Deblurring
Figure 4 for SR-R$^2$KAC: Improving Single Image Defocus Deblurring

We propose an efficient deep learning method for single image defocus deblurring (SIDD) by further exploring inverse kernel properties. Although the current inverse kernel method, i.e., kernel-sharing parallel atrous convolution (KPAC), can address spatially varying defocus blurs, it has difficulty in handling large blurs of this kind. To tackle this issue, we propose a Residual and Recursive Kernel-sharing Atrous Convolution (R$^2$KAC). R$^2$KAC builds on a significant observation of inverse kernels, that is, successive use of inverse-kernel-based deconvolutions with fixed size helps remove unexpected large blurs but produces ringing artifacts. Specifically, on top of kernel-sharing atrous convolutions used to simulate multi-scale inverse kernels, R$^2$KAC applies atrous convolutions recursively to simulate a large inverse kernel. Specifically, on top of kernel-sharing atrous convolutions, R$^2$KAC stacks atrous convolutions recursively to simulate a large inverse kernel. To further alleviate the contingent effect of recursive stacking, i.e., ringing artifacts, we add identity shortcuts between atrous convolutions to simulate residual deconvolutions. Lastly, a scale recurrent module is embedded in the R$^2$KAC network, leading to SR-R$^2$KAC, so that multi-scale information from coarse to fine is exploited to progressively remove the spatially varying defocus blurs. Extensive experimental results show that our method achieves the state-of-the-art performance.

* Submitted to IEEE Transactions on Cybernetics on 2023-July-24 
Viaarxiv icon

Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images

Jul 04, 2023
Peng Tang, Yang Nan, Tobias Lasser

Figure 1 for Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images
Figure 2 for Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images
Figure 3 for Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images
Figure 4 for Graph-Ensemble Learning Model for Multi-label Skin Lesion Classification using Dermoscopy and Clinical Images

Many skin lesion analysis (SLA) methods recently focused on developing a multi-modal-based multi-label classification method due to two factors. The first is multi-modal data, i.e., clinical and dermoscopy images, which can provide complementary information to obtain more accurate results than single-modal data. The second one is that multi-label classification, i.e., seven-point checklist (SPC) criteria as an auxiliary classification task can not only boost the diagnostic accuracy of melanoma in the deep learning (DL) pipeline but also provide more useful functions to the clinical doctor as it is commonly used in clinical dermatologist's diagnosis. However, most methods only focus on designing a better module for multi-modal data fusion; few methods explore utilizing the label correlation between SPC and skin disease for performance improvement. This study fills the gap that introduces a Graph Convolution Network (GCN) to exploit prior co-occurrence between each category as a correlation matrix into the DL model for the multi-label classification. However, directly applying GCN degraded the performances in our experiments; we attribute this to the weak generalization ability of GCN in the scenario of insufficient statistical samples of medical data. We tackle this issue by proposing a Graph-Ensemble Learning Model (GELN) that views the prediction from GCN as complementary information of the predictions from the fusion model and adaptively fuses them by a weighted averaging scheme, which can utilize the valuable information from GCN while avoiding its negative influences as much as possible. To evaluate our method, we conduct experiments on public datasets. The results illustrate that our GELN can consistently improve the classification performance on different datasets and that the proposed method can achieve state-of-the-art performance in SPC and diagnosis classification.

* Submitted to TNNLS in 1st July 2023 
Viaarxiv icon

DocFormerv2: Local Features for Document Understanding

Jun 02, 2023
Srikar Appalaraju, Peng Tang, Qi Dong, Nishant Sankaran, Yichu Zhou, R. Manmatha

Figure 1 for DocFormerv2: Local Features for Document Understanding
Figure 2 for DocFormerv2: Local Features for Document Understanding
Figure 3 for DocFormerv2: Local Features for Document Understanding
Figure 4 for DocFormerv2: Local Features for Document Understanding

We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU). The VDU domain entails understanding documents (beyond mere OCR predictions) e.g., extracting information from a form, VQA for documents and other tasks. VDU is challenging as it needs a model to make sense of multiple modalities (visual, language and spatial) to make a prediction. Our approach, termed DocFormerv2 is an encoder-decoder transformer which takes as input - vision, language and spatial features. DocFormerv2 is pre-trained with unsupervised tasks employed asymmetrically i.e., two novel document tasks on encoder and one on the auto-regressive decoder. The unsupervised tasks have been carefully designed to ensure that the pre-training encourages local-feature alignment between multiple modalities. DocFormerv2 when evaluated on nine datasets shows state-of-the-art performance over strong baselines e.g. TabFact (4.3%), InfoVQA (1.4%), FUNSD (1%). Furthermore, to show generalization capabilities, on three VQA tasks involving scene-text, Doc- Formerv2 outperforms previous comparably-sized models and even does better than much larger models (such as GIT2, PaLi and Flamingo) on some tasks. Extensive ablations show that due to its pre-training, DocFormerv2 understands multiple modalities better than prior-art in VDU.

Viaarxiv icon

RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks

Apr 17, 2023
Yunruo Zhang, Tianyu Du, Shouling Ji, Peng Tang, Shanqing Guo

Figure 1 for RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks
Figure 2 for RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks
Figure 3 for RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks
Figure 4 for RNN-Guard: Certified Robustness Against Multi-frame Attacks for Recurrent Neural Networks

It is well-known that recurrent neural networks (RNNs), although widely used, are vulnerable to adversarial attacks including one-frame attacks and multi-frame attacks. Though a few certified defenses exist to provide guaranteed robustness against one-frame attacks, we prove that defending against multi-frame attacks remains a challenging problem due to their enormous perturbation space. In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard. To address the above challenge, we adopt the perturb-all-frame strategy to construct perturbation spaces consistent with those in multi-frame attacks. However, the perturb-all-frame strategy causes a precision issue in linear relaxations. To address this issue, we introduce a novel abstract domain called InterZono and design tighter relaxations. We prove that InterZono is more precise than Zonotope yet carries the same time complexity. Experimental evaluations across various datasets and model structures show that the certified robust accuracy calculated by RNN-Guard with InterZono is up to 2.18 times higher than that with Zonotope. In addition, we extend RNN-Guard as the first certified training method against multi-frame attacks to directly enhance RNNs' robustness. The results show that the certified robust accuracy of models trained with RNN-Guard against multi-frame attacks is 15.47 to 67.65 percentage points higher than those with other training methods.

* 13 pages, 7 figures, 6 tables 
Viaarxiv icon

Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation

Sep 09, 2022
Yang Nan, Javier Del Ser, Zeyu Tang, Peng Tang, Xiaodan Xing, Yingying Fang, Francisco Herrera, Witold Pedrycz, Simon Walsh, Guang Yang

Figure 1 for Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation
Figure 2 for Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation
Figure 3 for Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation
Figure 4 for Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation

Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases, while its manual delineation is unduly burdensome. To alleviate this time-consuming and potentially subjective manual procedure, researchers have proposed methods to automatically segment airways from computerized tomography (CT) images. However, some small-sized airway branches (e.g., bronchus and terminal bronchioles) significantly aggravate the difficulty of automatic segmentation by machine learning models. In particular, the variance of voxel values and the severe data imbalance in airway branches make the computational module prone to discontinuous and false-negative predictions. especially for cohorts with different lung diseases. Attention mechanism has shown the capacity to segment complex structures, while fuzzy logic can reduce the uncertainty in feature representations. Therefore, the integration of deep attention networks and fuzzy theory, given by the fuzzy attention layer, should be an escalated solution for better generalization and robustness. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function to enhance the spatial continuity of airway segmentation. The deep fuzzy set is formulated by a set of voxels in the feature map and a learnable Gaussian membership function. Different from the existing attention mechanism, the proposed channel-specific fuzzy attention addresses the issue of heterogeneous features in different channels. Furthermore, a novel evaluation metric is proposed to assess both the continuity and completeness of airway structures. The efficiency, generalization and robustness of the proposed method have been proved by training on normal lung disease while testing on datasets of lung cancer, COVID-19 and pulmonary fibrosis.

* 12 pages, 5 figures, Submitted to IEEE TNNLS 
Viaarxiv icon

Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network

Aug 04, 2022
Yang Nan, Peng Tang, Guyue Zhang, Caihong Zeng, Zhihong Liu, Zhifan Gao, Heye Zhang, Guang Yang

Figure 1 for Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network
Figure 2 for Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network
Figure 3 for Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network
Figure 4 for Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network

Tissue segmentation is the mainstay of pathological examination, whereas the manual delineation is unduly burdensome. To assist this time-consuming and subjective manual step, researchers have devised methods to automatically segment structures in pathological images. Recently, automated machine and deep learning based methods dominate tissue segmentation research studies. However, most machine and deep learning based approaches are supervised and developed using a large number of training samples, in which the pixelwise annotations are expensive and sometimes can be impossible to obtain. This paper introduces a novel unsupervised learning paradigm by integrating an end-to-end deep mixture model with a constrained indicator to acquire accurate semantic tissue segmentation. This constraint aims to centralise the components of deep mixture models during the calculation of the optimisation function. In so doing, the redundant or empty class issues, which are common in current unsupervised learning methods, can be greatly reduced. By validation on both public and in-house datasets, the proposed deep constrained Gaussian network achieves significantly (Wilcoxon signed-rank test) better performance (with the average Dice scores of 0.737 and 0.735, respectively) on tissue segmentation with improved stability and robustness, compared to other existing unsupervised segmentation approaches. Furthermore, the proposed method presents a similar performance (p-value > 0.05) compared to the fully supervised U-Net.

* 13 pages, 8 figures, accepted by IEEE TMI 
Viaarxiv icon

Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology

Mar 11, 2022
Yang Nan, Fengyi Li, Peng Tang, Guyue Zhang, Caihong Zeng, Guotong Xie, Zhihong Liu, Guang Yang

Figure 1 for Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology
Figure 2 for Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology
Figure 3 for Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology
Figure 4 for Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology

Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task. In this paper, we introduce a scheme to recognize fine-grained glomeruli lesions from whole slide images. First, a focal instance structural similarity loss is proposed to drive the model to locate all types of glomeruli precisely. Then an Uncertainty Aided Apportionment Network is designed to carry out the fine-grained visual classification without bounding-box annotations. This double branch-shaped structure extracts common features of the child class from the parent class and produces the uncertainty factor for reconstituting the training dataset. Results of slide-wise evaluation illustrate the effectiveness of the entire scheme, with an 8-22% improvement of the mean Average Precision compared with remarkable detection methods. The comprehensive results clearly demonstrate the effectiveness of the proposed method.

* 33 pages, 6 figures, accepted by the Pattern Recognition journal 
Viaarxiv icon

Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction

May 31, 2021
Yan Wang, Peng Tang, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille

Figure 1 for Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction
Figure 2 for Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction
Figure 3 for Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction
Figure 4 for Learning Inductive Attention Guidance for Partially Supervised Pancreatic Ductal Adenocarcinoma Prediction

Pancreatic ductal adenocarcinoma (PDAC) is the third most common cause of cancer death in the United States. Predicting tumors like PDACs (including both classification and segmentation) from medical images by deep learning is becoming a growing trend, but usually a large number of annotated data are required for training, which is very labor-intensive and time-consuming. In this paper, we consider a partially supervised setting, where cheap image-level annotations are provided for all the training data, and the costly per-voxel annotations are only available for a subset of them. We propose an Inductive Attention Guidance Network (IAG-Net) to jointly learn a global image-level classifier for normal/PDAC classification and a local voxel-level classifier for semi-supervised PDAC segmentation. We instantiate both the global and the local classifiers by multiple instance learning (MIL), where the attention guidance, indicating roughly where the PDAC regions are, is the key to bridging them: For global MIL based normal/PDAC classification, attention serves as a weight for each instance (voxel) during MIL pooling, which eliminates the distraction from the background; For local MIL based semi-supervised PDAC segmentation, the attention guidance is inductive, which not only provides bag-level pseudo-labels to training data without per-voxel annotations for MIL training, but also acts as a proxy of an instance-level classifier. Experimental results show that our IAG-Net boosts PDAC segmentation accuracy by more than 5% compared with the state-of-the-arts.

Viaarxiv icon

Shape-Texture Debiased Neural Network Training

Oct 12, 2020
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Figure 1 for Shape-Texture Debiased Neural Network Training
Figure 2 for Shape-Texture Debiased Neural Network Training
Figure 3 for Shape-Texture Debiased Neural Network Training
Figure 4 for Shape-Texture Debiased Neural Network Training

Shape and texture are two prominent and complementary cues for recognizing objects. Nonetheless, Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset. Our ablation shows that such bias degenerates model performance. Motivated by this observation, we develop a simple algorithm for shape-texture debiased learning. To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (e.g., an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously. Experiments show that our method successfully improves model performance on several image recognition benchmarks and adversarial robustness. For example, by training on ImageNet, it helps ResNet-152 achieve substantial improvements on ImageNet (+1.2%), ImageNet-A (+5.2%), ImageNet-C (+8.3%) and Stylized-ImageNet (+11.1%), and on defending against FGSM adversarial attacker on ImageNet (+14.4%). Our method also claims to be compatible to other advanced data augmentation strategies, e.g., Mixup and CutMix. The code is available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining.

* The code is available here: https://github.com/LiYingwei/ShapeTextureDebiasedTraining 
Viaarxiv icon

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

Jan 25, 2020
Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong

Figure 1 for Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Figure 2 for Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Figure 3 for Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Figure 4 for Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video. Specifically, given an untrimmed video and a query sentence, our goal is to localize a temporal segment in the video that semantically corresponds to the query sentence, with no reliance on any temporal annotation during training. We propose a two-stage model to tackle this problem in a coarse-to-fine manner. In the coarse stage, we first generate a set of fixed-length temporal proposals using multi-scale sliding windows, and match their visual features against the sentence features to identify the best-matched proposal as a coarse grounding result. In the fine stage, we perform a fine-grained matching between the visual features of the frames in the best-matched proposal and the sentence features to locate the precise frame boundary of the fine grounding result. Comprehensive experiments on the ActivityNet Captions dataset and the Charades-STA dataset demonstrate that our two-stage model achieves compelling performance.

Viaarxiv icon