Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deng-Ping Fan

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Feb 24, 2021

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Figure 1 for Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Figure 2 for Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Figure 3 for Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Figure 4 for Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Abstract:Although using convolutional neural networks (CNNs) as backbones achieves great successes in computer vision, this work investigates a simple backbone network useful for many dense prediction tasks without convolutions. Unlike the recently-proposed Transformer model (e.g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks. PVT has several merits compared to prior arts. (1) Different from ViT that typically has low-resolution outputs and high computational and memory cost, PVT can be not only trained on dense partitions of the image to achieve high output resolution, which is important for dense predictions but also using a progressive shrinking pyramid to reduce computations of large feature maps. (2) PVT inherits the advantages from both CNN and Transformer, making it a unified backbone in various vision tasks without convolutions by simply replacing CNN backbones. (3) We validate PVT by conducting extensive experiments, showing that it boosts the performance of many downstream tasks, e.g., object detection, semantic, and instance segmentation. For example, with a comparable number of parameters, RetinaNet+PVT achieves 40.4 AP on the COCO dataset, surpassing RetinNet+ResNet50 (36.3 AP) by 4.1 absolute AP. We hope PVT could serve as an alternative and useful backbone for pixel-level predictions and facilitate future researches. Code is available at https://github.com/whai362/PVT.

* 13 pages, 6 figures, technical report

Via

Access Paper or Ask Questions

Salient Object Detection via Integrity Learning

Feb 21, 2021

Mingchen Zhuge, Deng-Ping Fan, Nian Liu, Dingwen Zhang, Dong Xu, Ling Shao

Figure 1 for Salient Object Detection via Integrity Learning

Figure 2 for Salient Object Detection via Integrity Learning

Figure 3 for Salient Object Detection via Integrity Learning

Figure 4 for Salient Object Detection via Integrity Learning

Abstract:Albeit current salient object detection (SOD) works have achieved fantastic progress, they are cast into the shade when it comes to the integrity of the predicted salient regions. We define the concept of integrity at both the micro and macro level. Specifically, at the micro level, the model should highlight all parts that belong to a certain salient object, while at the macro level, the model needs to discover all salient objects from the given image scene. To facilitate integrity learning for salient object detection, we design a novel Integrity Cognition Network (ICON), which explores three important components to learn strong integrity features. 1) Unlike the existing models that focus more on feature discriminability, we introduce a diverse feature aggregation (DFA) component to aggregate features with various receptive fields (i.e.,, kernel shape and context) and increase the feature diversity. Such diversity is the foundation for mining the integral salient objects. 2) Based on the DFA features, we introduce the integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects at the macro level, while suppressing the other distracting ones. 3) After extracting the enhanced features, the part-whole verification (PWV) method is employed to determine whether the part and whole object features have strong agreement. Such part-whole agreements can further improve the micro-level integrity for each salient object. To demonstrate the effectiveness of ICON, comprehensive experiments are conducted on seven challenging benchmarks, where promising results are achieved.

Via

Access Paper or Ask Questions

Concealed Object Detection

Feb 20, 2021

Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, Ling Shao

Abstract:We present the first systematic study on concealed object detection (COD), which aims to identify objects that are "perfectly" embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification, etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms 12 cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available on our project page: http://mmcheng.net/cod.

* 17 pages, 27 figures, Code: https://github.com/GewelsJI/SINet-V2

Via

Access Paper or Ask Questions

Boundary-Aware Segmentation Network for Mobile and Web Applications

Jan 12, 2021

Xuebin Qin, Deng-Ping Fan, Chenyang Huang, Cyril Diagne, Zichen Zhang, Adrià Cabeza Sant'Anna, Albert Suàrez, Martin Jagersand, Ling Shao

Figure 1 for Boundary-Aware Segmentation Network for Mobile and Web Applications

Figure 2 for Boundary-Aware Segmentation Network for Mobile and Web Applications

Figure 3 for Boundary-Aware Segmentation Network for Mobile and Web Applications

Figure 4 for Boundary-Aware Segmentation Network for Mobile and Web Applications

Abstract:Although deep models have greatly improved the accuracy and robustness of image segmentation, obtaining segmentation results with highly accurate boundaries and fine structures is still a challenging problem. In this paper, we propose a simple yet powerful Boundary-Aware Segmentation Network (BASNet), which comprises a predict-refine architecture and a hybrid loss, for highly accurate image segmentation. The predict-refine architecture consists of a densely supervised encoder-decoder network and a residual refinement module, which are respectively used to predict and refine a segmentation probability map. The hybrid loss is a combination of the binary cross entropy, structural similarity and intersection-over-union losses, which guide the network to learn three-level (ie, pixel-, patch- and map- level) hierarchy representations. We evaluate our BASNet on two reverse tasks including salient object segmentation, camouflaged object segmentation, showing that it achieves very competitive performance with sharp segmentation boundaries. Importantly, BASNet runs at over 70 fps on a single GPU which benefits many potential real applications. Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is integrated with augmented reality for "COPYING" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal. Both applications have already drawn huge amount of attention and have important real-world impacts. The code and two applications will be publicly available at: https://github.com/NathanUA/BASNet.

* 18 pages, 16 figures, submitted to TPAMI

Via

Access Paper or Ask Questions

Light Field Salient Object Detection: A Review and Benchmark

Oct 18, 2020

Yao Jiang, Tao Zhou, Ge-Peng Ji, Keren Fu, Qijun Zhao, Deng-Ping Fan

Figure 1 for Light Field Salient Object Detection: A Review and Benchmark

Figure 2 for Light Field Salient Object Detection: A Review and Benchmark

Figure 3 for Light Field Salient Object Detection: A Review and Benchmark

Figure 4 for Light Field Salient Object Detection: A Review and Benchmark

Abstract:Salient object detection (SOD) is a long-standing research topic in computer vision and has drawn an increasing amount of research interest in the past decade. This paper provides the first comprehensive review and benchmark for SOD on light field, which has long been lacking in the saliency community. Firstly, we introduce preliminary knowledge on lights, including theory and data forms, and then review existing studies on light field SOD, covering ten traditional models, seven deep learning-based models, one comparative study, and one brief review. Existing datasets for light field SOD are also summarized with detailed information and statistical analyses. Secondly, we benchmark seven representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, from which insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models, are achieved. Besides, due to the inconsistency of datasets in their current forms, we further generate complete data and supplement focal stacks, depth maps and multi-view images for the inconsistent datasets, making them consistent and unified. Our supplemental data makes a universal benchmark possible. Lastly, because light field SOD is quite a special problem attributed to its diverse data representations and high dependency on acquisition hardware, making it differ greatly from other saliency detection tasks, we provide nine hints into the challenges and future directions, and outline several open issues. We hope our review and benchmarking could serve as a catalyst to advance research in this field. All the materials including collected models, datasets, benchmarking results, and supplemented light field datasets will be publicly available on our project site https://github.com/kerenfu/LFSOD-Survey.

Via

Access Paper or Ask Questions

Uncertainty Inspired RGB-D Saliency Detection

Sep 07, 2020

Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Saleh, Sadegh Aliakbarian, Nick Barnes

Figure 1 for Uncertainty Inspired RGB-D Saliency Detection

Figure 2 for Uncertainty Inspired RGB-D Saliency Detection

Figure 3 for Uncertainty Inspired RGB-D Saliency Detection

Figure 4 for Uncertainty Inspired RGB-D Saliency Detection

Abstract:We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed. Inspired by the saliency data labeling process, we propose a generative architecture to achieve probabilistic RGB-D saliency detection which utilizes a latent variable to model the labeling variations. Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution. The generator model is an encoder-decoder saliency network. To infer the latent variable, we introduce two different solutions: i) a Conditional Variational Auto-encoder with an extra encoder to approximate the posterior distribution of the latent variable; and ii) an Alternating Back-Propagation technique, which directly samples the latent variable from the true posterior distribution. Qualitative and quantitative results on six challenging RGB-D benchmark datasets show our approach's superior performance in learning the distribution of saliency maps. The source code is publicly available via our project page: https://github.com/JingZhang617/UCNet.

Via

Access Paper or Ask Questions

Siamese Network for RGB-D Salient Object Detection and Beyond

Aug 26, 2020

Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, Ce Zhu

Figure 1 for Siamese Network for RGB-D Salient Object Detection and Beyond

Figure 2 for Siamese Network for RGB-D Salient Object Detection and Beyond

Figure 3 for Siamese Network for RGB-D Salient Object Detection and Beyond

Figure 4 for Siamese Network for RGB-D Salient Object Detection and Beyond

Abstract:Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the state-of-the-art models by an average of ~2.0% (F-measure) across seven challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T (thermal infrared) SOD and video SOD (VSOD), achieving comparable or even better performance against state-of-the-art methods. This further confirms that the proposed framework could offer a potential solution for various applications and provide more insight into the cross-modal complementarity task. The code will be available at https://github.com/kerenfu/JLDCF/

* arXiv admin note: substantial text overlap with arXiv:2004.08515

Via

Access Paper or Ask Questions

RGB-D Salient Object Detection: A Survey

Aug 01, 2020

Tao Zhou, Deng-Ping Fan, Ming-Ming Cheng, Jianbing Shen, Ling Shao

Figure 1 for RGB-D Salient Object Detection: A Survey

Figure 2 for RGB-D Salient Object Detection: A Survey

Figure 3 for RGB-D Salient Object Detection: A Survey

Figure 4 for RGB-D Salient Object Detection: A Survey

Abstract:Salient object detection (SOD), which simulates the human visual perception system to locate the most attractive object(s) in a scene, has been widely applied to various computer vision tasks. Now, with the advent of depth sensors, depth maps with affluent spatial information that can be beneficial in boosting the performance of SOD, can easily be captured. Although various RGB-D based SOD models with promising performance have been proposed over the past several years, an in-depth understanding of these models and challenges in this topic remains lacking. In this paper, we provide a comprehensive survey of RGB-D based SOD models from various perspectives, and review related benchmark datasets in detail. Further, considering that the light field can also provide depth maps, we review SOD models and popular benchmark datasets from this domain as well. Moreover, to investigate the SOD ability of existing models, we carry out a comprehensive evaluation, as well as attribute-based evaluation of several representative RGB-D based SOD models. Finally, we discuss several challenges and open directions of RGB-D based SOD for future research. All collected models, benchmark datasets, source code links, datasets constructed for attribute-based evaluation, and codes for evaluation will be made publicly available at https://github.com/taozh2017/RGBDSODsurvey

* 24 pages, 12 figures. Code: https://github.com/taozh2017/RGBD-SODsurvey

Via

Access Paper or Ask Questions

Re-thinking Co-Salient Object Detection

Jul 11, 2020

Deng-Ping Fan, Tengpeng Li, Zheng Lin, Ge-Peng Ji, Dingwen Zhang, Ming-Ming Cheng, Huazhu Fu, Jianbing Shen

Figure 1 for Re-thinking Co-Salient Object Detection

Figure 2 for Re-thinking Co-Salient Object Detection

Figure 3 for Re-thinking Co-Salient Object Detection

Figure 4 for Re-thinking Co-Salient Object Detection

Abstract:In this paper, we conduct a comprehensive study on the co-salient object detection (CoSOD) problem for images. CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images. However, existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances. This bias can lead to the ideal settings and effectiveness of models trained on existing datasets, being impaired in real-life situations, where similarities are usually semantic or conceptual. To tackle this issue, we first introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD datasets. Our CoSOD3k consists of 3,316 high-quality, elaborately selected images divided into 160 groups with hierarchical annotations. The images span a wide range of categories, shapes, object sizes, and backgrounds. Second, we integrate the existing SOD techniques to build a unified, trainable CoSOD framework, which is long overdue in this field. Specifically, we propose a novel CoEG-Net that augments our prior model EGNet with a co-attention projection strategy to enable fast common information learning. CoEG-Net fully leverages previous large-scale SOD datasets and significantly improves the model scalability and stability. Third, we comprehensively summarize 34 cutting-edge algorithms, benchmarking 16 of them over three challenging CoSOD datasets (iCoSeg, CoSal2015, and our CoSOD3k), and reporting more detailed (i.e., group-level) performance analysis. Finally, we discuss the challenges and future works of CoSOD. We hope that our study will give a strong boost to growth in the CoSOD community

* 22pages, 18 figures. CVPR2020-CoSOD3K extension. Code: https://dpfan.net/CoSOD3K

Via

Access Paper or Ask Questions

PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Jul 03, 2020

Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, Ling Shao

Figure 1 for PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Figure 2 for PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Figure 3 for PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Figure 4 for PraNet: Parallel Reverse Attention Network for Polyp Segmentation

Abstract:Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.

* Accepted to MICCAI 2020

Via

Access Paper or Ask Questions