Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sijia Liu

N3C Natural Language Processing

A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Feb 12, 2023

Hongkang Li, Meng Wang, Sijia Liu, Pin-yu Chen

Abstract:Vision Transformers (ViTs) with self-attention modules have recently achieved great empirical success in many vision tasks. Due to non-convex interactions across layers, however, theoretical learning and generalization analysis is mostly elusive. Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. We characterize the sample complexity to achieve a zero generalization error. Our sample complexity bound is positively correlated with the inverse of the fraction of label-relevant tokens, the token noise level, and the initial model error. We also prove that a training process using stochastic gradient descent (SGD) leads to a sparse attention map, which is a formal verification of the general intuition about the success of attention. Moreover, this paper indicates that a proper token sparsification can improve the test performance by removing label-irrelevant and/or noisy tokens, including spurious correlations. Empirical experiments on synthetic data and CIFAR-10 dataset justify our theoretical results and generalize to deeper ViTs.

Via

Access Paper or Ask Questions

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Feb 06, 2023

Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Abstract:Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets.

* The Eleventh International Conference on Learning Representations, 2023

Via

Access Paper or Ask Questions

Certified Interpretability Robustness for Class Activation Mapping

Jan 26, 2023

Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel

Figure 1 for Certified Interpretability Robustness for Class Activation Mapping

Figure 2 for Certified Interpretability Robustness for Class Activation Mapping

Figure 3 for Certified Interpretability Robustness for Class Activation Mapping

Figure 4 for Certified Interpretability Robustness for Class Activation Mapping

Abstract:Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide certificates for interpretability robustness. Taking a step in this direction, we present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of the top k pixels of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation not far from (4-5x) state-of-the-art attack methods.

* 13 pages, 5 figures. Accepted to Machine Learning for Autonomous Driving Workshop at NeurIPS 2020

Via

Access Paper or Ask Questions

Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Jan 20, 2023

Soumyadeep Pal, Ren Wang, Yuguang Yao, Sijia Liu

Figure 1 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 2 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 3 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Figure 4 for Towards Understanding How Self-training Tolerates Data Backdoor Poisoning

Abstract:Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated through experiments for different backdoor triggers on CIFAR-10 and a combination of CIFAR-10 with an additional unlabeled 500K TinyImages dataset. Finally, we explore the direction of combining self-supervised representation learning with self-training for further improvement in backdoor defense.

* Accepted at SafeAI 2023: AAAI's Workshop on Artificial Intelligence Safety

Via

Access Paper or Ask Questions

Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Jan 18, 2023

Kanghao Chen, Sijia Liu, Ruixuan Wang, Wei-Shi Zheng

Figure 1 for Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Figure 2 for Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Figure 3 for Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Figure 4 for Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Abstract:Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge. Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity). However, the stability-plasticity trade-off during continual learning may need to be dynamically changed for better model performance. In this paper, we propose two novel ways to adaptively balance model stability and plasticity. The first one is to adaptively integrate multiple levels of old knowledge and transfer it to each block level in the new model. The second one uses prediction uncertainty of old knowledge to naturally tune the importance of learning new knowledge during model training. To our best knowledge, this is the first time to connect model prediction uncertainty and knowledge distillation for continual learning. In addition, this paper applies a modified CutMix particularly to augment the data for old knowledge, further alleviating the catastrophic forgetting issue. Extensive evaluations on the CIFAR100 and the ImageNet datasets confirmed the effectiveness of the proposed method for continual learning.

Via

Access Paper or Ask Questions

DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Dec 20, 2022

Prakhar Gupta, Yang Liu, Di Jin, Behnam Hedayatnia, Spandana Gella, Sijia Liu, Patrick Lange, Julia Hirschberg, Dilek Hakkani-Tur

Figure 1 for DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Figure 2 for DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Figure 3 for DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Figure 4 for DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Abstract:Dialogue models are able to generate coherent and fluent responses, but they can still be challenging to control and may produce non-engaging, unsafe results. This unpredictability diminishes user trust and can hinder the use of the models in the real world. To address this, we introduce DialGuide, a novel framework for controlling dialogue model behavior using natural language rules, or guidelines. These guidelines provide information about the context they are applicable to and what should be included in the response, allowing the models to generate responses that are more closely aligned with the developer's expectations and intent. We evaluate DialGuide on three tasks in open-domain dialogue response generation: guideline selection, response generation, and response entailment verification. Our dataset contains 10,737 positive and 15,467 negative dialogue context-response-guideline triplets across two domains - chit-chat and safety. We provide baseline models for the tasks and benchmark their performance. We also demonstrate that DialGuide is effective in the dialogue safety domain, producing safe and engaging responses that follow developer guidelines.

Via

Access Paper or Ask Questions

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Dec 19, 2022

Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

Figure 1 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 2 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 3 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 4 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Abstract:Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon$-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon^{-5})$, which is better than the best-known $O(\varepsilon^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman-Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.

Via

Access Paper or Ask Questions

TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization

Dec 19, 2022

Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia Liu, Shiyu Chang

Abstract:Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. Specifically, we address the aforementioned challenges in a unified optimization framework. And we develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables and leverage an effective sampling method to establish an accurate mapping from the continuous optimization variables to the discrete textual perturbations. Moreover, as a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models. Extensive experiments are provided to demonstrate the effectiveness of TextGrad not only in attack generation for robustness evaluation but also in adversarial defense.

* 18 pages, 2 figures

Via

Access Paper or Ask Questions

CLAWSAT: Towards Both Robust and Accurate Code Models

Nov 22, 2022

Jinghan Jia, Shashank Srikant, Tamara Mitrovska, Chuang Gan, Shiyu Chang, Sijia Liu, Una-May O'Reilly

Figure 1 for CLAWSAT: Towards Both Robust and Accurate Code Models

Figure 2 for CLAWSAT: Towards Both Robust and Accurate Code Models

Figure 3 for CLAWSAT: Towards Both Robust and Accurate Code Models

Figure 4 for CLAWSAT: Towards Both Robust and Accurate Code Models

Abstract:We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further improve robustness and accuracy of the pre-trained code model. Built on the above two modules, we develop CLAWSAT, a novel self-supervised learning (SSL) framework for code by integrating $\underline{\textrm{CL}}$ with $\underline{\textrm{a}}$dversarial vie$\underline{\textrm{w}}$s (CLAW) with $\underline{\textrm{s}}$taggered $\underline{\textrm{a}}$dversarial $\underline{\textrm{t}}$raining (SAT). On evaluating three downstream tasks across Python and Java, we show that CLAWSAT consistently yields the best robustness and accuracy ($\textit{e.g.}$ 11$\%$ in robustness and 6$\%$ in accuracy on the code summarization task in Python). We additionally demonstrate the effectiveness of adversarial learning in CLAW by analyzing the characteristics of the loss landscape and interpretability of the pre-trained models.

Via

Access Paper or Ask Questions

On the Robustness of deep learning-based MRI Reconstruction to image transformations

Nov 21, 2022

Jinghan Jia, Mingyi Hong, Yimeng Zhang, Mehmet Akçakaya, Sijia Liu

Figure 1 for On the Robustness of deep learning-based MRI Reconstruction to image transformations

Figure 2 for On the Robustness of deep learning-based MRI Reconstruction to image transformations

Figure 3 for On the Robustness of deep learning-based MRI Reconstruction to image transformations

Figure 4 for On the Robustness of deep learning-based MRI Reconstruction to image transformations

Abstract:Although deep learning (DL) has received much attention in accelerated magnetic resonance imaging (MRI), recent studies show that tiny input perturbations may lead to instabilities of DL-based MRI reconstruction models. However, the approaches of robustifying these models are underdeveloped. Compared to image classification, it could be much more challenging to achieve a robust MRI image reconstruction network considering its regression-based learning objective, limited amount of training data, and lack of efficient robustness metrics. To circumvent the above limitations, our work revisits the problem of DL-based image reconstruction through the lens of robust machine learning. We find a new instability source of MRI image reconstruction, i.e., the lack of reconstruction robustness against spatial transformations of an input, e.g., rotation and cutout. Inspired by this new robustness metric, we develop a robustness-aware image reconstruction method that can defend against both pixel-wise adversarial perturbations as well as spatial transformations. Extensive experiments are also conducted to demonstrate the effectiveness of our proposed approaches.

* Accepted as TSRML'22 Paper

Via

Access Paper or Ask Questions