Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes), by sharing information of attributes between different objects. Attributes are artificially annotated for objects and are treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impact on the ZSL system performance. This paper first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting key attributes set can improve the generalization performance of the original ZSL model which uses all the attributes. Unfortunately, previous attribute selection methods are conducted based on the seen data, their selected attributes have poor generalization capability to the unseen data, which is unavailable in training stage for ZSL tasks. Inspired by learning from pseudo relevance feedback, this paper introduces the out-of-the-box data, which is pseudo data generated by an attribute-guided generative model, to mimic the unseen data. After that, we present an iterative attribute selection (IAS) strategy which iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.
Graph structured data provide two-fold information: graph structures and node attributes. Numerous graph-based algorithms rely on both information to achieve success in supervised tasks, such as node classification and link prediction. However, node attributes could be missing or incomplete, which significantly deteriorates the performance. The task of node attribute generation aims to generate attributes for those nodes whose attributes are completely unobserved. This task benefits many real-world problems like profiling, node classification and graph data augmentation. To tackle this task, we propose a deep adversarial learning based method to generate node attributes; called node attribute neural generator (NANG). NANG learns a unifying latent representation which is shared by both node attributes and graph structures and can be translated to different modalities. We thus use this latent representation as a bridge to convert information from one modality to another. We further introduce practical applications to quantify the performance of node attribute generation. Extensive experiments are conducted on four real-world datasets and the empirical results show that node attributes generated by the proposed method are high-qualitative and beneficial to other applications. The datasets and codes are available online.
Deep Neural Networks (DNNs) have recently achieved great success in many tasks, which encourages DNNs to be widely used as a machine learning service in model sharing scenarios. However, attackers can easily generate adversarial examples with a small perturbation to fool the DNN models to predict wrong labels. To improve the robustness of shared DNN models against adversarial attacks, we propose a novel method called Latent Adversarial Defence (LAD). The proposed LAD method improves the robustness of a DNN model through adversarial training on generated adversarial examples. Different from popular attack methods which are carried in the input space and only generate adversarial examples of repeating patterns, LAD generates myriad of adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism. Once adversarial examples are generated, we adversarially train the model through augmenting the training data with generated adversarial examples. Extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrate the effectiveness of our model in defending against different types of adversarial attacks.
In rank aggregation, preferences from different users are summarized into a total order under the homogeneous data assumption. Thus, model misspecification arises and rank aggregation methods take some noise models into account. However, they all rely on certain noise model assumptions and cannot handle agnostic noises in the real world. In this paper, we propose CoarsenRank, which rectifies the underlying data distribution directly and aligns it to the homogeneous data assumption without involving any noise model. To this end, we define a neighborhood of the data distribution over which Bayesian inference of CoarsenRank is performed, and therefore the resultant posterior enjoys robustness against model misspecification. Further, we derive a tractable closed-form solution for CoarsenRank making it computationally efficient. Experiments on real-world datasets show that CoarsenRank is fast and robust, achieving consistent improvement over baseline methods.
Generalization is vital important for many deep network models. It becomes more challenging when high robustness is required for learning with noisy labels. The 0-1 loss has monotonic relationship between empirical adversary (reweighted) risk, and it is robust to outliers. However, it is also difficult to optimize. To efficiently optimize 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for training as a curriculum learning. To handle large rate of noisy label corruption, we extend our curriculum loss to a more general form that can automatically prune the estimated noisy samples during training. Experimental results on noisy MNIST, CIFAR10 and CIFAR100 dataset validate the robustness of the proposed loss.
As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.
In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm to jointly maximize the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings. Moreover, we analyze the property of the adversarial regret that is required by robust initialization for Bayesian Optimization (BO), and prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating (initialization point set) to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms.
In weakly-supervised temporal action localization, previous works have failed to locate dense and integral regions for each entire action due to the overestimation of the most salient regions. To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. The MAAN employs a novel marginalized average aggregation (MAA) module and learns a set of latent discriminative probabilities in an end-to-end fashion. MAA samples multiple subsets from the video snippet features according to a set of latent discriminative probabilities and takes the expectation over all the averaged subset features. Theoretically, we prove that the MAA module with learned latent discriminative probabilities successfully reduces the difference in responses between the most salient regions and the others. Therefore, MAAN is able to generate better class activation sequences and identify dense and integral action regions in the videos. Moreover, we propose a fast algorithm to reduce the complexity of constructing MAA from O($2^T$) to O($T^2$). Extensive experiments on two large-scale video datasets show that our MAAN achieves superior performance on weakly-supervised temporal action localization
Learning with noisy labels, which aims to reduce expensive labors on accurate annotations, has become imperative in the Big Data era. Previous noise transition based method has achieved promising results and presented a theoretical guarantee on performance in the case of class-conditional noise. However, this type of approaches critically depend on an accurate pre-estimation of the noise transition, which is usually impractical. Subsequent improvement adapts the pre-estimation along with the training progress via a Softmax layer. However, the parameters in the Softmax layer are highly tweaked for the fragile performance due to the ill-posed stochastic approximation. To address these issues, we propose a Latent Class-Conditional Noise model (LCCN) that naturally embeds the noise transition under a Bayesian framework. By projecting the noise transition into a Dirichlet-distributed space, the learning is constrained on a simplex based on the whole dataset, instead of some ad-hoc parametric space. We then deduce a dynamic label regression method for LCCN to iteratively infer the latent labels, to stochastically train the classifier and to model the noise. Our approach safeguards the bounded update of the noise transition, which avoids previous arbitrarily tuning via a batch of samples. We further generalize LCCN for open-set noisy labels and the semi-supervised setting. We perform extensive experiments with the controllable noise data sets, CIFAR-10 and CIFAR-100, and the agnostic noise data sets, Clothing1M and WebVision17. The experimental results have demonstrated that the proposed model outperforms several state-of-the-art methods.