Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Yu

Understanding Robust Overfitting of Adversarial Training and Beyond

Jun 22, 2022
Chaojian Yu, Bo Han, Li Shen, Jun Yu, Chen Gong, Mingming Gong, Tongliang Liu

Figure 1 for Understanding Robust Overfitting of Adversarial Training and Beyond

Figure 2 for Understanding Robust Overfitting of Adversarial Training and Beyond

Figure 3 for Understanding Robust Overfitting of Adversarial Training and Beyond

Figure 4 for Understanding Robust Overfitting of Adversarial Training and Beyond

Robust overfitting widely exists in adversarial training of deep networks. The exact underlying reasons for this are still not completely understood. Here, we explore the causes of robust overfitting by comparing the data distribution of \emph{non-overfit} (weak adversary) and \emph{overfitted} (strong adversary) adversarial training, and observe that the distribution of the adversarial data generated by weak adversary mainly contain small-loss data. However, the adversarial data generated by strong adversary is more diversely distributed on the large-loss data and the small-loss data. Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose \emph{minimum loss constrained adversarial training} (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data. Technically, MLCAT hinders data fitting when they become easy to learn to prevent robust overfitting; philosophically, MLCAT reflects the spirit of turning waste into treasure and making the best use of each adversarial data; algorithmically, we designed two realizations of MLCAT, and extensive experiments demonstrate that MLCAT can eliminate robust overfitting and further boost adversarial robustness.

* ICML2022

Via

Access Paper or Ask Questions

Hilbert Curve Projection Distance for Distribution Comparison

Jun 09, 2022
Tao Li, Cheng Meng, Jun Yu, Hongteng Xu

Figure 1 for Hilbert Curve Projection Distance for Distribution Comparison

Figure 2 for Hilbert Curve Projection Distance for Distribution Comparison

Figure 3 for Hilbert Curve Projection Distance for Distribution Comparison

Figure 4 for Hilbert Curve Projection Distance for Distribution Comparison

Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with high robustness and low complexity. In particular, we first project two high-dimensional probability densities using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two densities in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for absolutely continuous probability measures. Furthermore, we demonstrate that the empirical HCP distance converges to its population counterpart at a rate of no more than $O(n^{-1/2d})$ under regularity conditions. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.

* 28 pages, 19 figures, add some references

Via

Access Paper or Ask Questions

An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

May 31, 2022
Jingyi Zhang, Cheng Meng, Jun Yu, Mengrui Zhang, Wenxuan Zhong, Ping Ma

Figure 1 for An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

Figure 2 for An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

Figure 3 for An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

Figure 4 for An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks. Most of these methods suffer from either a large computational burden or a theoretical weakness. In particular, the theoretical weakness is that the empirical distribution of the selected subsample may not necessarily converge to the population distribution. Such computational and theoretical limitations hinder the broad applicability of model-free subsampling methods in practice. We propose a novel model-free subsampling method by utilizing optimal transport techniques. Moreover, we develop an efficient subsampling algorithm that is adaptive to the unknown probability density function. Theoretically, we show the selected subsample can be used for efficient density estimation by deriving the convergence rate for the proposed subsample kernel density estimator. We also provide the optimal bandwidth for the proposed estimator. Numerical studies on synthetic and real-world datasets demonstrate the performance of the proposed method is superior.

Via

Access Paper or Ask Questions

Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

May 26, 2022
Mengyu Li, Jun Yu, Hongteng Xu, Cheng Meng

Figure 1 for Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

Figure 2 for Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

Figure 3 for Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

Figure 4 for Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification

As a valid metric of metric-measure spaces, Gromov-Wasserstein (GW) distance has shown the potential for the matching problems of structured data like point clouds and graphs. However, its application in practice is limited due to its high computational complexity. To overcome this challenge, we propose a novel importance sparsification method, called Spar-GW, to approximate GW distance efficiently. In particular, instead of considering a dense coupling matrix, our method leverages a simple but effective sampling strategy to construct a sparse coupling matrix and update it with few computations. We demonstrate that the proposed Spar-GW method is applicable to the GW distance with arbitrary ground cost, and it reduces the complexity from $\mathcal{O}(n^4)$ to $\mathcal{O}(n^{2+\delta})$ for an arbitrary small $\delta>0$. In addition, this method can be extended to approximate the variants of GW distance, including the entropic GW distance, the fused GW distance, and the unbalanced GW distance. Experiments show the superiority of our Spar-GW to state-of-the-art methods in both synthetic and real-world tasks.

* 24 pages, 7 figures

Via

Access Paper or Ask Questions

Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

May 19, 2022
Jun Yu, Hao Chang, Keda Lu, Liwen Zhang, Shenshen Du, Zhong Zhang

Figure 1 for Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

Figure 2 for Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

Figure 3 for Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

Figure 4 for Scene Clustering Based Pseudo-labeling Strategy for Multi-modal Aerial View Object Classification

Multi-modal aerial view object classification (MAVOC) in Automatic target recognition (ATR), although an important and challenging problem, has been under studied. This paper firstly finds that fine-grained data, class imbalance and various shooting conditions preclude the representational ability of general image classification. Moreover, the MAVOC dataset has scene aggregation characteristics. By exploiting these properties, we propose Scene Clustering Based Pseudo-labeling Strategy (SCP-Label), a simple yet effective method to employ in post-processing. The SCP-Label brings greater accuracy by assigning the same label to objects within the same scene while also mitigating bias and confusion with model ensembles. Its performance surpasses the official baseline by a large margin of +20.57% Accuracy on Track 1 (SAR), and +31.86% Accuracy on Track 2 (SAR+EO), demonstrating the potential of SCP-Label as post-processing. Finally, we win the championship both on Track1 and Track2 in the CVPR 2022 Perception Beyond the Visible Spectrum (PBVS) Workshop MAVOC Challenge. Our code is available at https://github.com/HowieChangchn/SCP-Label.

Via

Access Paper or Ask Questions

Multi-model Ensemble Learning Method for Human Expression Recognition

Mar 28, 2022
Jun Yu, Zhongpeng Cai, Peng He, Guocheng Xie, Qiang Ling

Figure 1 for Multi-model Ensemble Learning Method for Human Expression Recognition

Figure 2 for Multi-model Ensemble Learning Method for Human Expression Recognition

Figure 3 for Multi-model Ensemble Learning Method for Human Expression Recognition

Figure 4 for Multi-model Ensemble Learning Method for Human Expression Recognition

Analysis of human affect plays a vital role in human-computer interaction (HCI) systems. Due to the difficulty in capturing large amounts of real-life data, most of the current methods have mainly focused on controlled environments, which limit their application scenarios. To tackle this problem, we propose our solution based on the ensemble learning method. Specifically, we formulate the problem as a classification task, and then train several expression classification models with different types of backbones--ResNet, EfficientNet and InceptionNet. After that, the outputs of several models are fused via model ensemble method to predict the final results. Moreover, we introduce the multi-fold ensemble method to train and ensemble several models with the same architecture but different data distributions to enhance the performance of our solution. We conduct many experiments on the AffWild2 dataset of the ABAW2022 Challenge, and the results demonstrate the effectiveness of our solution.

Via

Access Paper or Ask Questions

Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Mar 24, 2022
Zhou Yu, Zitian Jin, Jun Yu, Mingliang Xu, Jianping Fan

Figure 1 for Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Figure 2 for Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Figure 3 for Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Figure 4 for Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Transformer-based approaches have shown great success in visual question answering (VQA). However, they usually require deep and wide models to guarantee good performance, making it difficult to deploy on capacity-restricted platforms. It is a challenging yet valuable task to design an elastic VQA model that supports adaptive pruning at runtime to meet the efficiency constraints of diverse platforms. In this paper, we present the Doubly Slimmable Transformer (DST), a general framework that can be seamlessly integrated into arbitrary Transformer-based VQA models to train one single model once and obtain various slimmed submodels of different widths and depths. Taking two typical Transformer-based VQA approaches, i.e., MCAN and UNITER, as the reference models, the obtained slimmable MCAN_DST and UNITER_DST models outperform the state-of-the-art methods trained independently on two benchmark datasets. In particular, one slimmed MCAN_DST submodel achieves a comparable accuracy on VQA-v2, while being 0.38x smaller in model size and having 0.27x fewer FLOPs than the reference MCAN model. The smallest MCAN_DST submodel has 9M parameters and 0.16G FLOPs in the inference stage, making it possible to be deployed on edge devices.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions