Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nan Zhao

Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Jul 08, 2022

Chunyan Zeng, Kang Yan, Zhifeng Wang, Yan Yu, Shiyan Xia, Nan Zhao

Figure 1 for Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Figure 2 for Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Figure 3 for Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Figure 4 for Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks

Abstract:The black-box nature of Deep Neural Networks (DNNs) severely hinders its performance improvement and application in specific scenes. In recent years, class activation mapping-based method has been widely used to interpret the internal decisions of models in computer vision tasks. However, when this method uses backpropagation to obtain gradients, it will cause noise in the saliency map, and even locate features that are irrelevant to decisions. In this paper, we propose an Absolute value Class Activation Mapping-based (Abs-CAM) method, which optimizes the gradients derived from the backpropagation and turns all of them into positive gradients to enhance the visual features of output neurons' activation, and improve the localization ability of the saliency map. The framework of Abs-CAM is divided into two phases: generating initial saliency map and generating final saliency map. The first phase improves the localization ability of the saliency map by optimizing the gradient, and the second phase linearly combines the initial saliency map with the original image to enhance the semantic information of the saliency map. We conduct qualitative and quantitative evaluation of the proposed method, including Deletion, Insertion, and Pointing Game. The experimental results show that the Abs-CAM can obviously eliminate the noise in the saliency map, and can better locate the features related to decisions, and is superior to the previous methods in recognition and localization tasks.

* Abs-CAM for Explanation of Convolutional Neural Networks

Via

Access Paper or Ask Questions

Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

May 03, 2022

Huabing Lu, Xianzhong Xie, Zhaoyuan Shi, Hongjian Lei, Nan Zhao, Jun Cai

Figure 1 for Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

Figure 2 for Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

Figure 3 for Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

Figure 4 for Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

Abstract:Rate splitting multiple access (RSMA) is a promising solution to improve spectral efficiency and provide better fairness for the upcoming sixth-generation (6G) networks. In this paper, the outage performance of uplink RSMA transmission with randomly deployed users is investigated, taking both user scheduling schemes and power allocation strategies into consideration. Specifically, the greedy user scheduling (GUS) and cumulative distribution function (CDF) based user scheduling (CUS) schemes are considered, which could maximize the rate performance and guarantee access fairness, respectively. Meanwhile, we re-investigate cognitive power allocation (CPA) strategy, and propose a new rate-fairness oriented power allocation (FPA) strategy to enhance the scheduled users rate fairness. By employing order statistics and stochastic geometry, an analytical expression of the outage probability for each scheduling scheme combining power allocation is derived to characterize the performance. To get more insights, the achieved diversity order of each scheme is also derived. Theoretical results demonstrate that both GUS and CUS schemes applying CPA or FPA strategy can achieve full diversity orders, and the application of CPA strategy in RSMA can effectively eliminate the secondary user's diversity order constraint from the primary user. Simulation results corroborate the accuracy of the analytical expressions, and show that the proposed FPA strategy can achieve excellent rate fairness performance in high signal-to-noise ratio region.

* 38 pages,9 figures

Via

Access Paper or Ask Questions

The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Sep 27, 2021

Nan Zhao, Haoran Li, Youzheng Wu, Xiaodong He, Bowen Zhou

Figure 1 for The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Figure 2 for The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Figure 3 for The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Figure 4 for The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

Abstract:With the development of the Internet, more and more people get accustomed to online shopping. When communicating with customer service, users may express their requirements by means of text, images, and videos, which precipitates the need for understanding these multimodal information for automatic customer service systems. Images usually act as discriminators for product models, or indicators of product failures, which play important roles in the E-commerce scenario. On the other hand, detailed information provided by the images is limited, and typically, customer service systems cannot understand the intents of users without the input text. Thus, bridging the gap of the image and text is crucial for the multimodal dialogue task. To handle this problem, we construct JDDC 2.0, a large-scale multimodal multi-turn dialogue dataset collected from a mainstream Chinese E-commerce platform (JD.com), containing about 246 thousand dialogue sessions, 3 million utterances, and 507 thousand images, along with product knowledge bases and image category annotations. We present the solutions of top-5 teams participating in the JDDC multimodal dialogue challenge based on this dataset, which provides valuable insights for further researches on the multimodal dialogue task.

Via

Access Paper or Ask Questions

Robust Federated Learning with Noisy Communication

Nov 01, 2019

Fan Ang, Li Chen, Nan Zhao, Yunfei Chen, Weidong Wang, F. Richard Yu

Figure 1 for Robust Federated Learning with Noisy Communication

Figure 2 for Robust Federated Learning with Noisy Communication

Figure 3 for Robust Federated Learning with Noisy Communication

Figure 4 for Robust Federated Learning with Noisy Communication

Abstract:Federated learning is a communication-efficient training process that alternates between local training at the edge devices and averaging the updated local model at the central server. Nevertheless, it is impractical to achieve a perfect acquisition of the local models in wireless communication due to noise, which also brings serious effects on federated learning. To tackle this challenge, we propose a robust design for federated learning to alleviate the effects of noise in this paper. Considering noise in the two aforementioned steps, we first formulate the training problem as a parallel optimization for each node under the expectation-based model and the worst-case model. Due to the non-convexity of the problem, a regularization for the loss function approximation method is proposed to make it tractable. Regarding the worst-case model, we develop a feasible training scheme which utilizes the sampling-based successive convex approximation algorithm to tackle the unavailable maxima or minima noise condition and the non-convex issue of the objective function. Furthermore, the convergence rates of both new designs are analyzed from a theoretical point of view. Finally, the improvement of prediction accuracy and the reduction of loss function are demonstrated via simulations for the proposed designs.

Via

Access Paper or Ask Questions

Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning

Oct 08, 2019

Kerry A. Nice, Jasper S. Wijnands, Ariane Middel, Jingcheng Wang, Yiming Qiu, Nan Zhao, Jason Thompson, Gideon D. P. A. Aschwanden, Haifeng Zhao, Mark Stevenson

Figure 1 for Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning

Figure 2 for Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning

Figure 3 for Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning

Figure 4 for Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning

Abstract:Computer vision techniques allow automated detection of sky pixels in outdoor imagery. Multiple applications exist for this information across a large number of research areas. In urban climate, sky detection is an important first step in gathering information about urban morphology and sky view factors. However, capturing accurate results remains challenging and becomes even more complex using imagery captured under a variety of lighting and weather conditions. To address this problem, we present a new sky pixel detection system demonstrated to produce accurate results using a wide range of outdoor imagery types. Images are processed using a selection of mean-shift segmentation, K-means clustering, and Sobel filters to mark sky pixels in the scene. The algorithm for a specific image is chosen by a convolutional neural network, trained with 25,000 images from the Skyfinder data set, reaching 82% accuracy with the top three classes. This selection step allows the sky marking to follow an adaptive process and to use different techniques and parameters to best suit a particular image. An evaluation of fourteen different techniques and parameter sets shows that no single technique can perform with high accuracy across varied Skyfinder and Google Street View data sets. However, by using our adaptive process, large increases in accuracy are observed. The resulting system is shown to perform better than other published techniques.

Via

Access Paper or Ask Questions

RGB-T Object Tracking:Benchmark and Baseline

May 23, 2018

Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, Jin Tang

Figure 1 for RGB-T Object Tracking:Benchmark and Baseline

Figure 2 for RGB-T Object Tracking:Benchmark and Baseline

Figure 3 for RGB-T Object Tracking:Benchmark and Baseline

Figure 4 for RGB-T Object Tracking:Benchmark and Baseline

Abstract:RGB-Thermal (RGB-T) object tracking receives more and more attention due to the strongly complementary benefits of thermal information to visible data. However, RGB-T research is limited by lacking a comprehensive evaluation platform. In this paper, we propose a large-scale video benchmark dataset for RGB-T tracking.It has three major advantages over existing ones: 1) Its size is sufficiently large for large-scale performance evaluation (total frame number: 234K, maximum frame per sequence: 8K). 2) The alignment between RGB-T sequence pairs is highly accurate, which does not need pre- or post-processing. 3) The occlusion levels are annotated for occlusion-sensitive performance analysis of different tracking algorithms.Moreover, we propose a novel graph-based approach to learn a robust object representation for RGB-T tracking. In particular, the tracked object is represented with a graph with image patches as nodes. This graph including graph structure, node weights and edge weights is dynamically learned in a unified ADMM (alternating direction method of multipliers)-based optimization framework, in which the modality weights are also incorporated for adaptive fusion of multiple source data.Extensive experiments on the large-scale dataset are executed to demonstrate the effectiveness of the proposed tracker against other state-of-the-art tracking methods. We also provide new insights and potential research directions to the field of RGB-T object tracking.

Via

Access Paper or Ask Questions

Recognition of Emotions using Kinects

Aug 04, 2015

Shun Li, Changye Zhu, Liqing Cui, Nan Zhao, Baobin Li, Tingshao Zhu

Figure 1 for Recognition of Emotions using Kinects

Figure 2 for Recognition of Emotions using Kinects

Figure 3 for Recognition of Emotions using Kinects

Figure 4 for Recognition of Emotions using Kinects

Abstract:Psychological studies indicate that emotional states are expressed in the way people walk and the human gait is investigated in terms of its ability to reveal a person's emotional state. And Microsoft Kinect is a rapidly developing, inexpensive, portable and no-marker motion capture system. This paper gives a new referable method to do emotion recognition, by using Microsoft Kinect to do gait pattern analysis, which has not been reported. $59$ subjects are recruited in this study and their gait patterns are record by two Kinect cameras. Significant joints selecting, Coordinate system transforming, Slider window gauss filter, Differential operation, and Data segmentation are used in data preprocessing. Feature extracting is based on Fourier transformation. By using the NaiveBayes, RandomForests, libSVM and SMO classification, the recognition rate of natural and unnatural emotions can reach above 70%.It is concluded that using the Kinect system can be a new method in recognition of emotions.

* 15 pages, 4 figures

Via

Access Paper or Ask Questions