Real world traffic sign recognition is an important step towards building autonomous vehicles, most of which highly dependent on Deep Neural Networks (DNNs). Recent studies demonstrated that DNNs are surprisingly susceptible to adversarial examples. Many attack methods have been proposed to understand and generate adversarial examples, such as gradient based attack, score based attack, decision based attack, and transfer based attacks. However, most of these algorithms are ineffective in real-world road sign attack, because (1) iteratively learning perturbations for each frame is not realistic for a fast moving car and (2) most optimization algorithms traverse all pixels equally without considering their diverse contribution. To alleviate these problems, this paper proposes the targeted attention attack (TAA) method for real world road sign attack. Specifically, we have made the following contributions: (1) we leverage the soft attention map to highlight those important pixels and skip those zero-contributed areas - this also helps to generate natural perturbations, (2) we design an efficient universal attack that optimizes a single perturbation/noise based on a set of training images under the guidance of the pre-trained attention map, (3) we design a simple objective function that can be easily optimized, (4) we evaluate the effectiveness of TAA on real world data sets. Experimental results validate that the TAA method improves the attack successful rate (nearly 10%) and reduces the perturbation loss (about a quarter) compared with the popular RP2 method. Additionally, our TAA also provides good properties, e.g., transferability and generalization capability. We provide code and data to ensure the reproducibility: https://github.com/AdvAttack/RoadSignAttack.
Domain adaptation aims to generalize a model from a source domain to tackle tasks in a related but different target domain. Traditional domain adaptation algorithms assume that enough labeled data, which are treated as the prior knowledge are available in the source domain. However, these algorithms will be infeasible when only a few labeled data exist in the source domain, and thus the performance decreases significantly. To address this challenge, we propose a Domain-invariant Graph Learning (DGL) approach for domain adaptation with only a few labeled source samples. Firstly, DGL introduces the Nystrom method to construct a plastic graph that shares similar geometric property as the target domain. And then, DGL flexibly employs the Nystrom approximation error to measure the divergence between plastic graph and source graph to formalize the distribution mismatch from the geometric perspective. Through minimizing the approximation error, DGL learns a domain-invariant geometric graph to bridge source and target domains. Finally, we integrate the learned domain-invariant graph with the semi-supervised learning and further propose an adaptive semi-supervised model to handle the cross-domain problems. The results of extensive experiments on popular datasets verify the superiority of DGL, especially when only a few labeled source samples are available.
Annotating a large-scale image dataset is very tedious, yet necessary for training person re-identification models. To alleviate such a problem, we present an active hard sample mining framework via training an effective re-ID model with the least labeling efforts. Considering that hard samples can provide informative patterns, we first formulate an uncertainty estimation to actively select hard samples to iteratively train a re-ID model from scratch. Then, intra-diversity estimation is designed to reduce the redundant hard samples by maximizing their diversity. Moreover, we propose a computer-assisted identity recommendation module embedded in the active hard sample mining framework to help human annotators to rapidly and accurately label the selected samples. Extensive experiments were carried out to demonstrate the effectiveness of our method on several public datasets. Experimental results indicate that our method can reduce 57%, 63%, and 49% annotation efforts on the Market1501, MSMT17, and CUHK03, respectively, while maximizing the performance of the re-ID model.
The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semi-supervised learning (SSL) has consequently received intensive attention in recent years and has been successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it has been observed that LR biases the classification function towards a constant function which possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple Hessian regularizations, each of which is obtained from a particular view of instances, and steers the classification function which varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.
Recently, label consistent k-svd (LC-KSVD) algorithm has been successfully applied in image classification. The objective function of LC-KSVD is consisted of reconstruction error, classification error and discriminative sparse codes error with L0-norm sparse regularization term. The L0-norm, however, leads to NP-hard problem. Despite some methods such as orthogonal matching pursuit can help solve this problem to some extent, it is quite difficult to find the optimum sparse solution. To overcome this limitation, we propose a label embedded dictionary learning (LEDL) method to utilise the L1-norm as the sparse regularization term so that we can avoid the hard-to-optimize problem by solving the convex optimization problem. Alternating direction method of multipliers and blockwise coordinate descent algorithm are then exploited to optimize the corresponding objective function. Extensive experimental results on six benchmark datasets illustrate that the proposed algorithm has achieved superior performance compared to some conventional classification algorithms.
Recently, manifold regularized semi-supervised learning (MRSSL) received considerable attention because it successfully exploits the geometry of the intrinsic data probability distribution including both labeled and unlabeled samples to leverage the performance of a learning model. As a natural nonlinear generalization of graph Laplacian, p-Laplacian has been proved having the rich theoretical foundations to better preserve the local structure. However, it is difficult to determine the fitting graph p-Lapalcian i.e. the parameter which is a critical factor for the performance of graph p-Laplacian. Therefore, we develop an ensemble p-Laplacian regularization (EpLapR) to fully approximate the intrinsic manifold of the data distribution. EpLapR incorporates multiple graphs into a regularization term in order to sufficiently explore the complementation of graph p-Laplacian. Specifically, we construct a fused graph by introducing an optimization approach to assign suitable weights on different p-value graphs. And then, we conduct semi-supervised learning framework on the fused graph. Extensive experiments on UC-Merced data set demonstrate the effectiveness and efficiency of the proposed method.
It is of great importance to preserve locality and similarity information in semi-supervised learning (SSL) based applications. Graph based SSL and manifold regularization based SSL including Laplacian regularization (LapR) and Hypergraph Laplacian regularization (HLapR) are representative SSL methods and have achieved prominent performance by exploiting the relationship of sample distribution. However, it is still a great challenge to exactly explore and exploit the local structure of the data distribution. In this paper, we present an effect and effective approximation algorithm of Hypergraph p-Laplacian and then propose Hypergraph p-Laplacian regularization (HpLapR) to preserve the geometry of the probability distribution. In particular, p-Laplacian is a nonlinear generalization of the standard graph Laplacian and Hypergraph is a generalization of a standard graph. Therefore, the proposed HpLapR provides more potential to exploiting the local structure preserving. We apply HpLapR to logistic regression and conduct the implementations for remote sensing image recognition. We compare the proposed HpLapR to several popular manifold regularization based SSL methods including LapR, HLapR and HpLapR on UC-Merced dataset. The experimental results demonstrate the superiority of the proposed HpLapR.
Image enhancement is an important image processing technique that processes images suitably for a specific application e.g. image editing. The conventional solutions of image enhancement are grouped into two categories which are spatial domain processing method and transform domain processing method such as contrast manipulation, histogram equalization, homomorphic filtering. This paper proposes a new image enhance method based on dictionary learning. Particularly, the proposed method adjusts the image by manipulating the rarity of dictionary atoms. Firstly, learn the dictionary through sparse coding algorithms on divided sub-image blocks. Secondly, compute the rarity of dictionary atoms on statistics of the corresponding sparse coefficients. Thirdly, adjust the rarity according to specific application and form a new dictionary. Finally, reconstruct the image using the updated dictionary and sparse coefficients. Compared with the traditional techniques, the proposed method enhances image based on the image content not on distribution of pixel grey value or frequency. The advantages of the proposed method lie in that it is in better correspondence with the response of the human visual system and more suitable for salient objects extraction. The experimental results demonstrate the effectiveness of the proposed image enhance method.
The ever-growing popularity of Kinect and inertial sensors has prompted intensive research efforts on human action recognition. Since human actions can be characterized by multiple feature representations extracted from Kinect and inertial sensors, multiview features must be encoded into a unified space optimal for human action recognition. In this paper, we propose a new unsupervised feature fusion method termed Multiview Cauchy Estimator Feature Embedding (MCEFE) for human action recognition. By minimizing empirical risk, MCEFE integrates the encoded complementary information in multiple views to find the unified data representation and the projection matrices. To enhance robustness to outliers, the Cauchy estimator is imposed on the reconstruction error. Furthermore, ensemble manifold regularization is enforced on the projection matrices to encode the correlations between different views and avoid overfitting. Experiments are conducted on the new Chinese Academy of Sciences - Yunnan University - Multimodal Human Action Database (CAS-YNU-MHAD) to demonstrate the effectiveness and robustness of MCEFE for human action recognition.