Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhu

Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Sep 25, 2019
Tianyu Pang, Kun Xu, Jun Zhu

Figure 1 for Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Figure 2 for Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Figure 3 for Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Figure 4 for Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

It has been widely recognized that adversarial examples can be easily crafted to fool deep networks, which mainly root from the locally non-linear behavior nearby input examples. Applying mixup in training provides an effective mechanism to improve generalization performance and model robustness against adversarial perturbations, which introduces the globally linear behavior in-between training examples. However, in previous work, the mixup-trained models only passively defend adversarial attacks in inference by directly classifying the inputs, where the induced global linearity is not well exploited. Namely, since the locality of the adversarial perturbations, it would be more efficient to actively break the locality via the globality of the model predictions. Inspired by simple geometric intuition, we develop an inference principle, named mixup inference (MI), for mixup-trained models. MI mixups the input with other random clean samples, which can shrink and transfer the equivalent perturbation if the input is adversarial. Our experiments on CIFAR-10 and CIFAR-100 demonstrate that MI can further improve the adversarial robustness for the models trained by mixup and its variants.

Via

Access Paper or Ask Questions

A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Sep 20, 2019
Yucen Luo, Jun Zhu, Tomas Pfister

Figure 1 for A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Figure 2 for A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Figure 3 for A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Figure 4 for A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Recently deep neural networks have shown their capacity to memorize training data, even with noisy labels, which hurts generalization performance. To mitigate this issue, we propose a simple but effective baseline that is robust to noisy labels, even with severe noise. Our objective involves a variance regularization term that implicitly penalizes the Jacobian norm of the neural network on the whole training set (including the noisy-labeled data), which encourages generalization and prevents overfitting to the corrupted labels. Experiments on both synthetically generated incorrect labels and realistic large-scale noisy datasets demonstrate that our approach achieves state-of-the-art performance with a high tolerance to severe noise.

Via

Access Paper or Ask Questions

Cross-Lingual Contextual Word Embeddings Mapping With Multi-Sense Words In Mind

Sep 18, 2019
Zheng Zhang, Ruiqing Yin, Jun Zhu, Pierre Zweigenbaum

Figure 1 for Cross-Lingual Contextual Word Embeddings Mapping With Multi-Sense Words In Mind

Figure 2 for Cross-Lingual Contextual Word Embeddings Mapping With Multi-Sense Words In Mind

Figure 3 for Cross-Lingual Contextual Word Embeddings Mapping With Multi-Sense Words In Mind

Figure 4 for Cross-Lingual Contextual Word Embeddings Mapping With Multi-Sense Words In Mind

Recent work in cross-lingual contextual word embedding learning cannot handle multi-sense words well. In this work, we explore the characteristics of contextual word embeddings and show the link between contextual word embeddings and word senses. We propose two improving solutions by considering contextual multi-sense word embeddings as noise (removal) and by generating cluster level average anchor embeddings for contextual multi-sense word embeddings (replacement). Experiments show that our solutions can improve the supervised contextual word embeddings alignment for multi-sense words in a microscopic perspective without hurting the macroscopic performance on the bilingual lexicon induction task. For unsupervised alignment, our methods significantly improve the performance on the bilingual lexicon induction task for more than 10 points.

* 12 pages

Via

Access Paper or Ask Questions

DashNet: A Hybrid Artificial and Spiking Neural Network for High-speed Object Tracking

Sep 15, 2019
Zheyu Yang, Yujie Wu, Guanrui Wang, Yukuan Yang, Guoqi Li, Lei Deng, Jun Zhu, Luping Shi

Figure 1 for DashNet: A Hybrid Artificial and Spiking Neural Network for High-speed Object Tracking

Figure 2 for DashNet: A Hybrid Artificial and Spiking Neural Network for High-speed Object Tracking

Figure 3 for DashNet: A Hybrid Artificial and Spiking Neural Network for High-speed Object Tracking

Figure 4 for DashNet: A Hybrid Artificial and Spiking Neural Network for High-speed Object Tracking

Computer-science-oriented artificial neural networks (ANNs) have achieved tremendous success in a variety of scenarios via powerful feature extraction and high-precision data operations. It is well known, however, that ANNs usually suffer from expensive processing resources and costs. In contrast, neuroscience-oriented spiking neural networks (SNNs) are promising for energy-efficient information processing benefit from the event-driven spike activities, whereas, they are yet be evidenced to achieve impressive effectiveness on real complicated tasks. How to combine the advantage of these two model families is an open question of great interest. Two significant challenges need to be addressed: (1) lack of benchmark datasets including both ANN-oriented (frames) and SNN-oriented (spikes) signal resources; (2) the difficulty in jointly processing the synchronous activation from ANNs and event-driven spikes from SNNs. In this work, we proposed a hybrid paradigm, named as DashNet, to demonstrate the advantages of combining ANNs and SNNs in a single model. A simulator and benchmark dataset NFS-DAVIS is built, and a temporal complementary filter (TCF) and attention module are designed to address the two mentioned challenges, respectively. In this way, it is shown that DashNet achieves the record-breaking speed of 2083FPS on neuromorphic chips and the best tracking performance on NFS-DAVIS and PRED18 datasets. To the best of our knowledge, DashNet is the first framework that can integrate and process ANNs and SNNs in a hybrid paradigm, which provides a novel solution to achieve both effectiveness and efficiency for high-speed object tracking.

Via

Access Paper or Ask Questions

Improving Black-box Adversarial Attacks with a Transfer-based Prior

Jun 17, 2019
Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu

Figure 1 for Improving Black-box Adversarial Attacks with a Transfer-based Prior

Figure 2 for Improving Black-box Adversarial Attacks with a Transfer-based Prior

Figure 3 for Improving Black-box Adversarial Attacks with a Transfer-based Prior

Figure 4 for Improving Black-box Adversarial Attacks with a Transfer-based Prior

We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

Via

Access Paper or Ask Questions

Multi-objects Generation with Amortized Structural Regularization

Jun 10, 2019
Kun Xu, Chongxuan Li, Jun Zhu, Bo Zhang

Figure 1 for Multi-objects Generation with Amortized Structural Regularization

Figure 2 for Multi-objects Generation with Amortized Structural Regularization

Figure 3 for Multi-objects Generation with Amortized Structural Regularization

Figure 4 for Multi-objects Generation with Amortized Structural Regularization

Deep generative models (DGMs) have shown promise in image generation. However, most of the existing work learn the model by simply optimizing a divergence between the marginal distributions of the model and the data, and often fail to capture the rich structures and relations in multi-object images. Human knowledge is a critical element to the success of DGMs to infer these structures. In this paper, we propose the amortized structural regularization (ASR) framework, which adopts the posterior regularization (PR) to embed human knowledge into DGMs via a set of structural constraints. We derive a lower bound of the regularized log-likelihood, which can be jointly optimized with respect to the generative model and recognition model efficiently. Empirical results show that ASR significantly outperforms the DGM baselines in terms of inference accuracy and sample quality.

Via

Access Paper or Ask Questions

Scalable Training of Inference Networks for Gaussian-Process Models

May 27, 2019
Jiaxin Shi, Mohammad Emtiyaz Khan, Jun Zhu

Figure 1 for Scalable Training of Inference Networks for Gaussian-Process Models

Figure 2 for Scalable Training of Inference Networks for Gaussian-Process Models

Figure 3 for Scalable Training of Inference Networks for Gaussian-Process Models

Inference in Gaussian process (GP) models is computationally challenging for large data, and often difficult to approximate with a small number of inducing points. We explore an alternative approximation that employs stochastic inference networks for a flexible inference. Unfortunately, for such networks, minibatch training is difficult to be able to learn meaningful correlations over function outputs for a large dataset. We propose an algorithm that enables such training by tracking a stochastic, functional mirror-descent algorithm. At each iteration, this only requires considering a finite number of input locations, resulting in a scalable and easy-to-implement algorithm. Empirical results show comparable and, sometimes, superior performance to existing sparse variational GP methods.

* ICML 2019. Update results added in the camera-ready version

Via

Access Paper or Ask Questions

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

May 25, 2019
Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu

Figure 1 for Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Figure 2 for Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Figure 3 for Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Figure 4 for Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we instead focus on inducing locally dense sample distribution, i.e., high sample density in the feature space which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants induce inappropriate sample density distributions in the feature space, which inspires us to design appropriate training objectives. Specifically, we propose the Max-Mahalanobis center (MMC) loss to create high-density regions for better robustness. It encourages the learned features to gather around the preset class centers with optimal inter-class dispersion. Comparing to the SCE loss and its variants, we empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation.

Via

Access Paper or Ask Questions

D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition

May 23, 2019
Tsung Wei Tsai, Chongxuan Li, Jun Zhu

$Figure 1 for D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition$

$Figure 2 for D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition$

$Figure 3 for D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition$

$Figure 4 for D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition$

Despite the recent progress in deep semi-supervised learning (Semi-SL), the amount of labels still plays a dominant role. The success in self-supervised learning (Self-SL) hints a promising direction to exploit the vast unlabeled data by leveraging an additional set of deterministic labels. In this paper, we propose Deep Self-Semi-Supervised learning (D$S^3$L), a flexible multi-task framework with shared parameters that integrates the rotation task in Self-SL with the consistency-based methods in deep Semi-SL. Our method is easy to implement and is complementary to all consistency-based approaches. The experiments demonstrate that our method significantly improves over the published state-of-the-art methods on several standard benchmarks, especially when fewer labels are presented.

Via

Access Paper or Ask Questions

Boosting Generative Models by Leveraging Cascaded Meta-Models

May 11, 2019
Fan Bao, Hang Su, Jun Zhu

Figure 1 for Boosting Generative Models by Leveraging Cascaded Meta-Models

Figure 2 for Boosting Generative Models by Leveraging Cascaded Meta-Models

Figure 3 for Boosting Generative Models by Leveraging Cascaded Meta-Models

Figure 4 for Boosting Generative Models by Leveraging Cascaded Meta-Models

Deep generative models are effective methods of modeling data. However, it is not easy for a single generative model to faithfully capture the distributions of complex data such as images. In this paper, we propose an approach for boosting generative models, which cascades meta-models together to produce a stronger model. Any hidden variable meta-model (e.g., RBM and VAE) which supports likelihood evaluation can be leveraged. We derive a decomposable variational lower bound of the boosted model, which allows each meta-model to be trained separately and greedily. Besides, our framework can be extended to semi-supervised boosting, where the boosted model learns a joint distribution of data and labels. Finally, we combine our boosting framework with the multiplicative boosting framework, which further improves the learning power of generative models.

Via

Access Paper or Ask Questions