Deep neural networks (DNNs) are playing key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, which are almost imperceptibly different from original samples, but can greatly change the network output. Existing white-box attack algorithms can generate powerful adversarial examples. Nevertheless, most of the algorithms concentrate on how to iteratively make the best use of gradients to improve adversarial performance. In contrast, in this paper, we focus on the properties of the widely-used ReLU activation function, and discover that there exist two phenomena (i.e., wrong blocking and over transmission) misleading the calculation of gradients in ReLU during the backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradient and corresponding actual changes, and mislead the gradients which results in larger perturbations. Therefore, we propose a universal adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient based white-box attack algorithms. During the backpropagation of the network, our approach calculates the gradient of the loss function versus network input, maps the values to scores, and selects a part of them to update the misleading gradients. Comprehensive experimental results on \emph{ImageNet} demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attack attackers, to further decrease perturbations in the ${\ell _2}$-norm.
Identifying and locating diseases in chest X-rays are very challenging, due to the low visual contrast between normal and abnormal regions, and distortions caused by other overlapping tissues. An interesting phenomenon is that there exist many similar structures in the left and right parts of the chest, such as ribs, lung fields and bronchial tubes. This kind of similarities can be used to identify diseases in chest X-rays, according to the experience of broad-certificated radiologists. Aimed at improving the performance of existing detection methods, we propose a deep end-to-end module to exploit the contralateral context information for enhancing feature representations of disease proposals. First of all, under the guidance of the spine line, the spatial transformer network is employed to extract local contralateral patches, which can provide valuable context information for disease proposals. Then, we build up a specific module, based on both additive and subtractive operations, to fuse the features of the disease proposal and the contralateral patch. Our method can be integrated into both fully and weakly supervised disease detection frameworks. It achieves 33.17 AP50 on a carefully annotated private chest X-ray dataset which contains 31,000 images. Experiments on the NIH chest X-ray dataset indicate that our method achieves state-of-the-art performance in weakly-supervised disease localization.
Recently, many Convolution Neural Networks (CNN) have been successfully employed in bitemporal SAR image change detection. However, most of the existing networks are too heavy and occupy a large volume of memory for storage and calculation. Motivated by this, in this paper, we propose a lightweight neural network to reduce the computational and spatial complexity and facilitate the change detection on an edge device. In the proposed network, we replace normal convolutional layers with bottleneck layers that keep the same number of channels between input and output. Next, we employ dilated convolutional kernels with a few non-zero entries that reduce the running time in convolutional operators. Comparing with the conventional convolutional neural network, our light-weighted neural network will be more efficient with fewer parameters. We verify our light-weighted neural network on four sets of bitemporal SAR images. The experimental results show that the proposed network can obtain better performance than the conventional CNN and has better model generalization, especially on the challenging datasets with complex scenes.
Small objects are difficult to detect because of their low resolution and small size. The existing small object detection methods mainly focus on data preprocessing or narrowing the differences between large and small objects. Inspired by human vision "attention" mechanism, we exploit two feature extraction methods to mine the most useful information of small objects. Both methods are based on multiresolution feature extraction. We initially design and explore the soft attention method, but we find that its convergence speed is slow. Then we present the second method, an attention-based feature interaction method, called a MultiResolution Attention Extractor (MRAE), showing significant improvement as a generic feature extractor in small object detection. After each building block in the vanilla feature extractor, we append a small network to generate attention weights followed by a weighted-sum operation to get the final attention maps. Our attention-based feature extractor is 2.0 times the AP of the "hard" attention counterpart (plain architecture) on the COCO small object detection benchmark, proving that MRAE can capture useful location and contextual information through adaptive learning.
We can consider Counterfactuals as belonging in the domain of Discourse structure and semantics, A core area in Natural Language Understanding and in this paper, we introduce an approach to resolving counterfactual detection as well as the indexing of the antecedents and consequents of Counterfactual statements. While Transfer learning is already being applied to several NLP tasks, It has the characteristics to excel in a novel number of tasks. We show that detecting Counterfactuals is a straightforward Binary Classification Task that can be implemented with minimal adaptation on already existing model Architectures, thanks to a well annotated training data set,and we introduce a new end to end pipeline to process antecedents and consequents as an entity recognition task, thus adapting them into Token Classification.
In synthetic aperture radar (SAR) image change detection, it is quite challenging to exploit the changing information from the noisy difference image subject to the speckle. In this paper, we propose a multi-scale spatial pooling (MSSP) network to exploit the changed information from the noisy difference image. Being different from the traditional convolutional network with only mono-scale pooling kernels, in the proposed method, multi-scale pooling kernels are equipped in a convolutional network to exploit the spatial context information on changed regions from the difference image. Furthermore, to verify the generalization of the proposed method, we apply our proposed method to the cross-dataset bitemporal SAR image change detection, where the MSSP network (MSSP-Net) is trained on a dataset and then applied to an unknown testing dataset. We compare the proposed method with other state-of-arts and the comparisons are performed on four challenging datasets of bitemporal SAR images. Experimental results demonstrate that our proposed method obtains comparable results with S-PCA-Net on YR-A and YR-B dataset and outperforms other state-of-art methods, especially on the Sendai-A and Sendai-B datasets with more complex scenes. More important, MSSP-Net is more efficient than S-PCA-Net and convolutional neural networks (CNN) with less executing time in both training and testing phases.
Polarimetric SAR data has the characteristics of all-weather, all-time and so on, which is widely used in many fields. However, the data of annotation is relatively small, which is not conducive to our research. In this paper, we have collected five open polarimetric SAR images, which are images of the San Francisco area. These five images come from different satellites at different times, which has great scientific research value. We annotate the collected images at the pixel level for image classification and segmentation. For the convenience of researchers, the annotated data is open source https://github.com/liuxuvip/PolSF.
In this paper, we propose a hierarchical feature-aware tracking framework for efficient visual tracking. Recent years, ensembled trackers which combine multiple component trackers have achieved impressive performance. In ensembled trackers, the decision of results is usually a post-event process, i.e., tracking result for each tracker is first obtained and then the suitable one is selected according to result ensemble. In this paper, we propose a pre-event method. We construct an expert pool with each expert being one set of features. For each frame, several experts are first selected in the pool according to their past performance and then they are used to predict the object. The selection rate of each expert in the pool is then updated and tracking result is obtained according to result ensemble. We propose a novel pre-known expert-adaptive selection strategy. Since the process is more efficient, more experts can be constructed by fusing more types of features which leads to more robustness. Moreover, with the novel expert selection strategy, overfitting caused by fixed experts for each frame can be mitigated. Experiments on several public available datasets demonstrate the superiority of the proposed method and its state-of-the-art performance among ensembled trackers.
In this paper, we propose a new first-order gradient-based algorithm to train deep neural networks. We first introduce the sign operation of stochastic gradients (as in sign-based methods, e.g., SIGN-SGD) into ADAM, which is called as signADAM. Moreover, in order to make the rate of fitting each feature closer, we define a confidence function to distinguish different components of gradients and apply it to our algorithm. It can generate more sparse gradients than existing algorithms do. We call this new algorithm signADAM++. In particular, both our algorithms are easy to implement and can speed up training of various deep neural networks. The motivation of signADAM++ is preferably learning features from the most different samples by updating large and useful gradients regardless of useless information in stochastic gradients. We also establish theoretical convergence guarantees for our algorithms. Empirical results on various datasets and models show that our algorithms yield much better performance than many state-of-the-art algorithms including SIGN-SGD, SIGNUM and ADAM. We also analyze the performance from multiple perspectives including the loss landscape and develop an adaptive method to further improve generalization. The source code is available at https://github.com/DongWanginxdu/signADAM-Learn-by-Confidence.