Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nick Barnes

Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Oct 27, 2021
Weixuan Sun, Jing Zhang, Nick Barnes

Figure 1 for Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Figure 2 for Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Figure 3 for Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Figure 4 for Inferring the Class Conditional Response Map for Weakly Supervised Semantic Segmentation

Image-level weakly supervised semantic segmentation (WSSS) relies on class activation maps (CAMs) for pseudo labels generation. As CAMs only highlight the most discriminative regions of objects, the generated pseudo labels are usually unsatisfactory to serve directly as supervision. To solve this, most existing approaches follow a multi-training pipeline to refine CAMs for better pseudo-labels, which includes: 1) re-training the classification model to generate CAMs; 2) post-processing CAMs to obtain pseudo labels; and 3) training a semantic segmentation model with the obtained pseudo labels. However, this multi-training pipeline requires complicated adjustment and additional time. To address this, we propose a class-conditional inference strategy and an activation aware mask refinement loss function to generate better pseudo labels without re-training the classifier. The class conditional inference-time approach is presented to separately and iteratively reveal the classification network's hidden object activation to generate more complete response maps. Further, our activation aware mask refinement loss function introduces a novel way to exploit saliency maps during segmentation training and refine the foreground object masks without suppressing background objects. Our method achieves superior WSSS results without requiring re-training of the classifier.

* WACV 2022

Via

Access Paper or Ask Questions

Dense Uncertainty Estimation

Oct 13, 2021
Jing Zhang, Yuchao Dai, Mochu Xiang, Deng-Ping Fan, Peyman Moghadam, Mingyi He, Christian Walder, Kaihao Zhang, Mehrtash Harandi, Nick Barnes

Figure 1 for Dense Uncertainty Estimation

Figure 2 for Dense Uncertainty Estimation

Figure 3 for Dense Uncertainty Estimation

Figure 4 for Dense Uncertainty Estimation

Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing. In this way, a specific weights set is estimated while ignoring any uncertainty that may occur in the proper weight space. The latter introduces randomness into the framework, either by assuming a prior distribution over model parameters (i.e. Bayesian Neural Networks) or including latent variables (i.e. generative models) to explore the contribution of latent variables for model predictions, leading to stochastic predictions during testing. Different from the former that achieves point estimation, the latter aims to estimate the prediction distribution, making it possible to estimate uncertainty, representing model ignorance about its predictions. We claim that conventional deterministic neural network based dense prediction tasks are prone to overfitting, leading to over-confident predictions, which is undesirable for decision making. In this paper, we investigate stochastic neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation. Specifically, we work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework. Due to the close connection between uncertainty estimation and model calibration, we also introduce how uncertainty estimation can be used for deep model calibration to achieve well-calibrated models, namely dense model calibration. Code and data are available at https://github.com/JingZhang617/UncertaintyEstimation.

* Technical Report

Via

Access Paper or Ask Questions

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Sep 15, 2021
Jing Zhang, Deng-Ping Fan, Yuchao Dai, Xin Yu, Yiran Zhong, Nick Barnes, Ling Shao

Figure 1 for RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Figure 2 for RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Figure 3 for RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Figure 4 for RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7x larger than NJU2K) dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at "https://github.com/JingZhang617/cascaded_rgbd_sod".

* Accepted as ICCV2021 paper

Via

Access Paper or Ask Questions

PnP-3D: A Plug-and-Play for 3D Point Clouds

Aug 16, 2021
Shi Qiu, Saeed Anwar, Nick Barnes

Figure 1 for PnP-3D: A Plug-and-Play for 3D Point Clouds

Figure 2 for PnP-3D: A Plug-and-Play for 3D Point Clouds

Figure 3 for PnP-3D: A Plug-and-Play for 3D Point Clouds

Figure 4 for PnP-3D: A Plug-and-Play for 3D Point Clouds

With the help of the deep learning paradigm, many point cloud networks have been invented for visual analysis. However, there is great potential for development of these networks since the given information of point cloud data has not been fully exploited. To improve the effectiveness of existing networks in analyzing point cloud data, we propose a plug-and-play module, PnP-3D, aiming to refine the fundamental point cloud feature representations by involving more local context and global bilinear response from explicit 3D space and implicit feature space. To thoroughly evaluate our approach, we conduct experiments on three standard point cloud analysis tasks, including classification, semantic segmentation, and object detection, where we select three state-of-the-art networks from each task for evaluation. Serving as a plug-and-play module, PnP-3D can significantly boost the performances of established networks. In addition to achieving state-of-the-art results on four widely used point cloud benchmarks, we present comprehensive ablation studies and visualizations to demonstrate our approach's advantages. The code will be available at https://github.com/ShiQiu0419/pnp-3d.

Via

Access Paper or Ask Questions

Energy-Based Generative Cooperative Saliency Prediction

Jun 25, 2021
Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes

Figure 1 for Energy-Based Generative Cooperative Saliency Prediction

Figure 2 for Energy-Based Generative Cooperative Saliency Prediction

Figure 3 for Energy-Based Generative Cooperative Saliency Prediction

Figure 4 for Energy-Based Generative Cooperative Saliency Prediction

Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps. In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the prediction as a sampling process. Specifically, we propose a generative cooperative saliency prediction framework based on the generative cooperative networks, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. We call our model the SalCoopNets. The latent variable model serves as a fast but coarse predictor to efficiently produce an initial prediction, which is then refined by the iterative Langevin revision of the energy-based model that serves as a fine predictor. Such a coarse-to-fine cooperative saliency prediction strategy offers the best of both worlds. Moreover, we generalize our framework to the scenario of weakly supervised saliency prediction, where saliency annotation of training images is partially observed, by proposing a cooperative learning while recovering strategy. Lastly, we show that the learned energy function can serve as a refinement module that can refine the results of other pre-trained saliency prediction models. Experimental results show that our generative model can achieve state-of-the-art performance. Our code is publicly available at: \url{https://github.com/JingZhang617/SalCoopNets}.

Via

Access Paper or Ask Questions

Confidence-Aware Learning for Camouflaged Object Detection

Jun 22, 2021
Jiawei Liu, Jing Zhang, Nick Barnes

Figure 1 for Confidence-Aware Learning for Camouflaged Object Detection

Figure 2 for Confidence-Aware Learning for Camouflaged Object Detection

Figure 3 for Confidence-Aware Learning for Camouflaged Object Detection

Figure 4 for Confidence-Aware Learning for Camouflaged Object Detection

Confidence-aware learning is proven as an effective solution to prevent networks becoming overconfident. We present a confidence-aware camouflaged object detection framework using dynamic supervision to produce both accurate camouflage map and meaningful "confidence" representing model awareness about the current prediction. A camouflaged object detection network is designed to produce our camouflage prediction. Then, we concatenate it with the input image and feed it to the confidence estimation network to produce an one channel confidence map.We generate dynamic supervision for the confidence estimation network, representing the agreement of camouflage prediction with the ground truth camouflage map. With the produced confidence map, we introduce confidence-aware learning with the confidence map as guidance to pay more attention to the hard/low-confidence pixels in the loss function. We claim that, once trained, our confidence estimation network can evaluate pixel-wise accuracy of the prediction without relying on the ground truth camouflage map. Extensive results on four camouflaged object detection testing datasets illustrate the superior performance of the proposed model in explaining the camouflage prediction.

Via

Access Paper or Ask Questions

Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Apr 20, 2021
Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes

Figure 1 for Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Figure 2 for Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Figure 3 for Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Figure 4 for Transformer Transforms Salient Object Detection and Camouflaged Object Detection

The transformer networks, which originate from machine translation, are particularly good at modeling long-range dependencies within a long sequence. Currently, the transformer networks are making revolutionary progress in various vision tasks ranging from high-level classification tasks to low-level dense prediction tasks. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). Specifically, we adopt the dense transformer backbone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD via scribble supervision. As an extension, we also apply our fully supervised model to the task of camouflaged object detection (COD) for camouflaged object segmentation. For the fully supervised models, we define the dense transformer backbone as feature encoder, and design a very simple decoder to produce a one channel saliency map (or camouflage map for the COD task). For the weakly supervised model, as there exists no structure information in the scribble annotation, we first adopt the recent proposed Gated-CRF loss to effectively model the pair-wise relationships for accurate model prediction. Then, we introduce self-supervised learning strategy to push the model to produce scale-invariant predictions, which is proven effective for weakly supervised models and models trained on small training datasets. Extensive experimental results on various SOD and COD tasks (fully supervised RGB image based SOD, fully supervised RGB-D image pair based SOD, weakly supervised SOD via scribble supervision, and fully supervised RGB image based COD) illustrate that transformer networks can transform salient object detection and camouflaged object detection, leading to new benchmarks for each related task.

* Technical report, 15 pages, 18 figures

Via

Access Paper or Ask Questions

Learning structure-aware semantic segmentation with image-level supervision

Apr 15, 2021
Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes

Figure 1 for Learning structure-aware semantic segmentation with image-level supervision

Figure 2 for Learning structure-aware semantic segmentation with image-level supervision

Figure 3 for Learning structure-aware semantic segmentation with image-level supervision

Figure 4 for Learning structure-aware semantic segmentation with image-level supervision

Compared with expensive pixel-wise annotations, image-level labels make it possible to learn semantic segmentation in a weakly-supervised manner. Within this pipeline, the class activation map (CAM) is obtained and further processed to serve as a pseudo label to train the semantic segmentation model in a fully-supervised manner. In this paper, we argue that the lost structure information in CAM limits its application in downstream semantic segmentation, leading to deteriorated predictions. Furthermore, the inconsistent class activation scores inside the same object contradicts the common sense that each region of the same object should belong to the same semantic category. To produce sharp prediction with structure information, we introduce an auxiliary semantic boundary detection module, which penalizes the deteriorated predictions. Furthermore, we adopt smoothness loss to encourage prediction inside the object to be consistent. Experimental results on the PASCAL-VOC dataset illustrate the effectiveness of the proposed solution.

* IJCNN2021

Via

Access Paper or Ask Questions

Weakly Supervised Video Salient Object Detection

Apr 06, 2021
Wangbo Zhao, Jing Zhang, Long Li, Nick Barnes, Nian Liu, Junwei Han

Figure 1 for Weakly Supervised Video Salient Object Detection

Figure 2 for Weakly Supervised Video Salient Object Detection

Figure 3 for Weakly Supervised Video Salient Object Detection

Figure 4 for Weakly Supervised Video Salient Object Detection

Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain. To relieve the burden of data annotation, we present the first weakly supervised video salient object detection model based on relabeled "fixation guided scribble annotations". Specifically, an "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling based on our new weak annotations. Further, we design a novel foreground-background similarity loss to further explore the labeling similarity across frames. A weak annotation boosting strategy is also introduced to boost our model performance with a new pseudo-label generation technique. Extensive experimental results on six benchmark video saliency detection datasets illustrate the effectiveness of our solution.

* 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Via

Access Paper or Ask Questions

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Mar 12, 2021
Shi Qiu, Saeed Anwar, Nick Barnes

Figure 1 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Figure 2 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Figure 3 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Figure 4 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Given the prominence of current 3D sensors, a fine-grained analysis on the basic point cloud data is worthy of further investigation. Particularly, real point cloud scenes can intuitively capture complex surroundings in the real world, but due to 3D data's raw nature, it is very challenging for machine perception. In this work, we concentrate on the essential visual task, semantic segmentation, for large-scale point cloud data collected in reality. On the one hand, to reduce the ambiguity in nearby points, we augment their local context by fully utilizing both geometric and semantic features in a bilateral structure. On the other hand, we comprehensively interpret the distinctness of the points from multiple resolutions and represent the feature map following an adaptive fusion method at point-level for accurate semantic segmentation. Further, we provide specific ablation studies and intuitive visualizations to validate our key modules. By comparing with state-of-the-art networks on three different benchmarks, we demonstrate the effectiveness of our network.

* Accepted in CVPR2021

Via

Access Paper or Ask Questions