Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cho-Jui Hsieh

Extreme Zero-Shot Learning for Extreme Text Classification

Dec 16, 2021

Yuanhao Xiong, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit Dhillon

Figure 1 for Extreme Zero-Shot Learning for Extreme Text Classification

Figure 2 for Extreme Zero-Shot Learning for Extreme Text Classification

Figure 3 for Extreme Zero-Shot Learning for Extreme Text Classification

Figure 4 for Extreme Zero-Shot Learning for Extreme Text Classification

Abstract:The eXtreme Multi-label text Classification (XMC) problem concerns finding most relevant labels for an input text instance from a large label set. However, the XMC setup faces two challenges: (1) it is not generalizable to predict unseen labels in dynamic environments, and (2) it requires a large amount of supervised (instance, label) pairs, which can be difficult to obtain for emerging domains. Recently, the generalized zero-shot XMC (GZ-XMC) setup has been studied and ZestXML is proposed accordingly to handle the unseen labels, which still requires a large number of annotated (instance, label) pairs. In this paper, we consider a more practical scenario called Extreme Zero-Shot XMC (EZ-XMC), in which no supervision is needed and merely raw text of instances and labels are accessible. Few-Shot XMC (FS-XMC), an extension to EZ-XMC with limited supervision is also investigated. To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses. Specifically, we develop a pre-training method MACLR, which thoroughly leverages the raw text with techniques including Multi-scale Adaptive Clustering, Label Regularization, and self-training with pseudo positive pairs. Experimental results on four public EZ-XMC datasets demonstrate that MACLR achieves superior performance compared to all other leading baseline methods, in particular with approximately 5-10% improvement in precision and recall on average. Moreover, we also show that our pre-trained encoder can be further improved on FS-XMC when there are a limited number of ground-truth positive pairs in training. By fine-tuning the encoder on such a few-shot subset, MACLR still outperforms other extreme classifiers significantly.

* Our code is available at https://github.com/amzn/pecos/tree/mainline/examples/MACLR

Via

Access Paper or Ask Questions

Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Dec 15, 2021

Jaehui Hwang, Huan Zhang, Jun-Ho Choi, Cho-Jui Hsieh, Jong-Seok Lee

Figure 1 for Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Figure 2 for Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Figure 3 for Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Figure 4 for Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Abstract:Recently, video-based action recognition methods using convolutional neural networks (CNNs) achieve remarkable recognition performance. However, there is still lack of understanding about the generalization mechanism of action recognition models. In this paper, we suggest that action recognition models rely on the motion information less than expected, and thus they are robust to randomization of frame orders. Based on this observation, we develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. Another observation enabling our defense method is that adversarial perturbations on videos are sensitive to temporal destruction. To the best of our knowledge, this is the first attempt to design a defense method specific to video-based action recognition models.

Via

Access Paper or Ask Questions

A Review of Adversarial Attack and Defense for Classification Methods

Nov 18, 2021

Yao Li, Minhao Cheng, Cho-Jui Hsieh, Thomas C. M. Lee

Figure 1 for A Review of Adversarial Attack and Defense for Classification Methods

Figure 2 for A Review of Adversarial Attack and Defense for Classification Methods

Figure 3 for A Review of Adversarial Attack and Defense for Classification Methods

Figure 4 for A Review of Adversarial Attack and Defense for Classification Methods

Abstract:Despite the efficiency and scalability of machine learning systems, recent studies have demonstrated that many classification methods, especially deep neural networks (DNNs), are vulnerable to adversarial examples; i.e., examples that are carefully crafted to fool a well-trained classification model while being indistinguishable from natural data to human. This makes it potentially unsafe to apply DNNs or related methods in security-critical areas. Since this issue was first identified by Biggio et al. (2013) and Szegedy et al.(2014), much work has been done in this field, including the development of attack methods to generate adversarial examples and the construction of defense techniques to guard against such examples. This paper aims to introduce this topic and its latest developments to the statistical community, primarily focusing on the generation and guarding of adversarial examples. Computing codes (in python and R) used in the numerical experiments are publicly available for readers to explore the surveyed methods. It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.

* The American Statistician. 0 (2021) 1-44

Via

Access Paper or Ask Questions

Can Vision Transformers Perform Convolution?

Nov 03, 2021

Shanda Li, Xiangning Chen, Di He, Cho-Jui Hsieh

Figure 1 for Can Vision Transformers Perform Convolution?

Figure 2 for Can Vision Transformers Perform Convolution?

Figure 3 for Can Vision Transformers Perform Convolution?

Abstract:Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers. This naturally leads to the following questions: Can a self-attention layer of ViT express any convolution operation? In this work, we prove that a single ViT layer with image patches as the input can perform any convolution operation constructively, where the multi-head attention mechanism and the relative positional encoding play essential roles. We further provide a lower bound on the number of heads for Vision Transformers to express CNNs. Corresponding with our analysis, experimental results show that the construction in our proof can help inject convolutional bias into Transformers and significantly improve the performance of ViT in low data regimes.

Via

Access Paper or Ask Questions

Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Oct 29, 2021

Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, Inderjit S Dhillon

Figure 1 for Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Figure 2 for Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Figure 3 for Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Figure 4 for Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction

Abstract:Learning on graphs has attracted significant attention in the learning community due to numerous real-world applications. In particular, graph neural networks (GNNs), which take numerical node features and graph structure as inputs, have been shown to achieve state-of-the-art performance on various graph-related learning tasks. Recent works exploring the correlation between numerical node features and graph structure via self-supervised learning have paved the way for further performance improvements of GNNs. However, methods used for extracting numerical node features from raw data are still graph-agnostic within standard GNN pipelines. This practice is sub-optimal as it prevents one from fully utilizing potential correlations between graph topology and node attributes. To mitigate this issue, we propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT). GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information, and scales to large datasets. We also provide a theoretical analysis that justifies the use of XMC over link prediction and motivates integrating XR-Transformers, a powerful method for solving XMC problems, into the GIANT framework. We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets: For example, we improve the accuracy of the top-ranked method GAMLP from $68.25\%$ to $69.67\%$, SGC from $63.29\%$ to $66.10\%$ and MLP from $47.24\%$ to $61.10\%$ on the ogbn-papers100M dataset by leveraging GIANT.

Via

Access Paper or Ask Questions

How and When Adversarial Robustness Transfers in Knowledge Distillation?

Oct 22, 2021

Rulin Shao, Jinfeng Yi, Pin-Yu Chen, Cho-Jui Hsieh

Figure 1 for How and When Adversarial Robustness Transfers in Knowledge Distillation?

Figure 2 for How and When Adversarial Robustness Transfers in Knowledge Distillation?

Figure 3 for How and When Adversarial Robustness Transfers in Knowledge Distillation?

Figure 4 for How and When Adversarial Robustness Transfers in Knowledge Distillation?

Abstract:Knowledge distillation (KD) has been widely used in teacher-student training, with applications to model compression in resource-constrained deep learning. Current works mainly focus on preserving the accuracy of the teacher model. However, other important model properties, such as adversarial robustness, can be lost during distillation. This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in KD. We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model. Our experiments of KD contain a diverse set of teacher and student models with varying network architectures and sizes evaluated on ImageNet and CIFAR-10 datasets, including residual neural networks (ResNets) and vision transformers (ViTs). Our comprehensive analysis shows several novel insights that (1) With KDIGA, students can preserve or even exceed the adversarial robustness of the teacher model, even when their models have fundamentally different architectures; (2) KDIGA enables robustness to transfer to pre-trained students, such as KD from an adversarially trained ResNet to a pre-trained ViT, without loss of clean accuracy; and (3) Our derived local linearity bounds for characterizing adversarial robustness in KD are consistent with the empirical results.

Via

Access Paper or Ask Questions

Adversarial Attack across Datasets

Oct 13, 2021

Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh

Figure 1 for Adversarial Attack across Datasets

Figure 2 for Adversarial Attack across Datasets

Figure 3 for Adversarial Attack across Datasets

Figure 4 for Adversarial Attack across Datasets

Abstract:It has been observed that Deep Neural Networks (DNNs) are vulnerable to transfer attacks in the query-free black-box setting. However, all the previous studies on transfer attack assume that the white-box surrogate models possessed by the attacker and the black-box victim models are trained on the same dataset, which means the attacker implicitly knows the label set and the input size of the victim model. However, this assumption is usually unrealistic as the attacker may not know the dataset used by the victim model, and further, the attacker needs to attack any randomly encountered images that may not come from the same dataset. Therefore, in this paper we define a new Generalized Transferable Attack (GTA) problem where we assume the attacker has a set of surrogate models trained on different datasets (with different label sets and image sizes), and none of them is equal to the dataset used by the victim model. We then propose a novel method called Image Classification Eraser (ICE) to erase classification information for any encountered images from arbitrary dataset. Extensive experiments on Cifar-10, Cifar-100, and TieredImageNet demonstrate the effectiveness of the proposed ICE on the GTA problem. Furthermore, we show that existing transfer attack methods can be modified to tackle the GTA problem, but with significantly worse performance compared with ICE.

Via

Access Paper or Ask Questions

Training Meta-Surrogate Model for Transferable Adversarial Attack

Sep 07, 2021

Yunxiao Qin, Yuanhao Xiong, Jinfeng Yi, Cho-Jui Hsieh

Figure 1 for Training Meta-Surrogate Model for Transferable Adversarial Attack

Figure 2 for Training Meta-Surrogate Model for Transferable Adversarial Attack

Figure 3 for Training Meta-Surrogate Model for Transferable Adversarial Attack

Figure 4 for Training Meta-Surrogate Model for Transferable Adversarial Attack

Abstract:We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle -- instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Aug 29, 2021

Zongyi Li, Jianhan Xu, Jiehang Zeng, Linyang Li, Xiaoqing Zheng, Qi Zhang, Kai-Wei Chang, Cho-Jui Hsieh

Figure 1 for Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Figure 2 for Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Figure 3 for Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Figure 4 for Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Abstract:Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.

* Accepted by EMNLP2021 main conference

Via

Access Paper or Ask Questions

RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Aug 18, 2021

Ruochen Wang, Xiangning Chen, Minhao Cheng, Xiaocheng Tang, Cho-Jui Hsieh

Figure 1 for RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Figure 2 for RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Figure 3 for RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Figure 4 for RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving

Abstract:Predictor-based algorithms have achieved remarkable performance in the Neural Architecture Search (NAS) tasks. However, these methods suffer from high computation costs, as training the performance predictor usually requires training and evaluating hundreds of architectures from scratch. Previous works along this line mainly focus on reducing the number of architectures required to fit the predictor. In this work, we tackle this challenge from a different perspective - improve search efficiency by cutting down the computation budget of architecture training. We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. To effectively leverage the non-uniform supervision signals produced by NOSH, we formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by ~5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.

* To Appear in ICCV2021. The code will be released shortly at https://github.com/ruocwang

Via

Access Paper or Ask Questions