Alert button
Picture for Ananth Balashankar

Ananth Balashankar

Alert button

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

Oct 25, 2023
Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel

As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a classifier to detect violations? In this paper, we study the novel setting of domain-generalized few-shot learning for LLM-based text safety classifiers. Unlike prior few-shot work, these new safety issues can be hard to uncover and we do not get to choose the few examples. We demonstrate that existing few-shot techniques do not perform well in this setting, and rather we propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting training data based on similar examples in prior existing rules. We empirically show that our approach of similarity-based data-augmentation + prompt-tuning (DAPT) consistently outperforms baselines that either do not rely on data augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule is loosely correlated with existing ones.

Viaarxiv icon

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

Oct 25, 2023
Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel

Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbations - such as word-substitution - does not actually improve robustness to human adversaries. In this paper, we propose an adversarial training framework that uses limited human adversarial examples to generate more useful adversarial examples at scale. We demonstrate the advantages of this system on the ANLI and hate speech detection benchmark datasets - both collected via an iterative, adversarial human-and-model-in-the-loop procedure. Compared to training only on observed human attacks, also training on our synthetic adversarial examples improves model robustness to future rounds. In ANLI, we see accuracy gains on the current set of attacks (44.1%$\,\to\,$50.1%) and on two future unseen rounds of human generated attacks (32.5%$\,\to\,$43.4%, and 29.4%$\,\to\,$40.2%). In hate speech detection, we see AUC gains on current attacks (0.76 $\to$ 0.84) and a future round (0.77 $\to$ 0.79). Attacks from methods that do not learn the distribution of existing human adversaries, meanwhile, degrade robustness.

Viaarxiv icon

Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

May 22, 2023
Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

Figure 1 for Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals
Figure 2 for Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals
Figure 3 for Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals
Figure 4 for Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

Counterfactual Data Augmentation (CDA) is a commonly used technique for improving robustness in natural language classifiers. However, one fundamental challenge is how to discover meaningful counterfactuals and efficiently label them, with minimal human labeling cost. Most existing methods either completely rely on human-annotated labels, an expensive process which limits the scale of counterfactual data, or implicitly assume label invariance, which may mislead the model with incorrect labels. In this paper, we present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals by actively sampling from regions of uncertainty, and then automatically label them with a learned pairwise classifier. Our key insight is that we can more correctly label the generated counterfactuals by training a pairwise classifier that interpolates the relationship between the original example and the counterfactual. We demonstrate that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels, that provides an 18-20% improvement in robustness and a 14-21% reduction in errors on 6 out-of-domain datasets, comparable to that of a fully human-annotated counterfactual dataset for both sentiment classification and question paraphrase tasks.

Viaarxiv icon

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

Feb 02, 2023
Zhouxing Shi, Nicholas Carlini, Ananth Balashankar, Ludwig Schmidt, Cho-Jui Hsieh, Alex Beutel, Yao Qin

Figure 1 for Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
Figure 2 for Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
Figure 3 for Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
Figure 4 for Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

``Effective robustness'' measures the extra out-of-distribution (OOD) robustness beyond what can be predicted from the in-distribution (ID) performance. Existing effective robustness evaluations typically use a single test set such as ImageNet to evaluate ID accuracy. This becomes problematic when evaluating models trained on different data distributions, e.g., comparing models trained on ImageNet vs. zero-shot language-image pre-trained models trained on LAION. In this paper, we propose a new effective robustness evaluation metric to compare the effective robustness of models trained on different data distributions. To do this we control for the accuracy on multiple ID test sets that cover the training distributions for all the evaluated models. Our new evaluation metric provides a better estimate of the effectiveness robustness and explains the surprising effective robustness gains of zero-shot CLIP-like models exhibited when considering only one ID dataset, while the gains diminish under our evaluation.

Viaarxiv icon

Fine-grained prediction of food insecurity using news streams

Nov 17, 2021
Ananth Balashankar, Lakshminarayanan Subramanian, Samuel P. Fraiberger

Figure 1 for Fine-grained prediction of food insecurity using news streams
Figure 2 for Fine-grained prediction of food insecurity using news streams
Figure 3 for Fine-grained prediction of food insecurity using news streams

Anticipating the outbreak of a food crisis is crucial to efficiently allocate emergency relief and reduce human suffering. However, existing food insecurity early warning systems rely on risk measures that are often delayed, outdated, or incomplete. Here, we leverage recent advances in deep learning to extract high-frequency precursors to food crises from the text of a large corpus of news articles about fragile states published between 1980 and 2020. Our text features are causally grounded, interpretable, validated by existing data, and allow us to predict 32% more food crises than existing models up to three months ahead of time at the district level across 15 fragile states. These results could have profound implications on how humanitarian aid gets allocated and open new avenues for machine learning to improve decision making in data-scarce environments.

Viaarxiv icon

Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling

Oct 01, 2020
Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies, Lakshminarayanan Subramanian

Figure 1 for Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
Figure 2 for Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
Figure 3 for Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
Figure 4 for Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling

This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we demonstrate that a solution combining syntactic DP coupled with type-specific SRL tasks provides the highest accuracy for retrieving contextual privacy parameters from privacy statements. We also observe that incorporating domain-specific knowledge is critical to achieving high precision and recall, thus inspiring new NLP research to address this important problem in the privacy domain.

* 11 pages, 4 figures 
Viaarxiv icon

What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers

Oct 30, 2019
Ananth Balashankar, Alyssa Lees, Chris Welty, Lakshminarayanan Subramanian

Figure 1 for What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers
Figure 2 for What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers
Figure 3 for What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers
Figure 4 for What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers

The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as race and gender, seek to rectify inequity but can yield non-uniform degradation in performance for skewed datasets. In certain domains, imbalanced degradation of performance can yield another form of unintentional bias. In the spirit of constructing fairness-aware algorithms as societal imperative, we explore an alternative: Pareto-Efficient Fairness (PEF). Theoretically, we prove that PEF identifies the operating point on the Pareto curve of subgroup performances closest to the fairness hyperplane, maximizing multiple subgroup accuracy. Empirically we demonstrate that PEF outperforms by achieving Pareto levels in accuracy for all subgroups compared to strict fairness constraints in several UCI datasets.

Viaarxiv icon

Fairness Sample Complexity and the Case for Human Intervention

Oct 24, 2019
Ananth Balashankar, Alyssa Lees

Figure 1 for Fairness Sample Complexity and the Case for Human Intervention
Figure 2 for Fairness Sample Complexity and the Case for Human Intervention
Figure 3 for Fairness Sample Complexity and the Case for Human Intervention
Figure 4 for Fairness Sample Complexity and the Case for Human Intervention

With the aim of building machine learning systems that incorporate standards of fairness and accountability, we explore explicit subgroup sample complexity bounds. The work is motivated by the observation that classifier predictions for real world datasets often demonstrate drastically different metrics, such as accuracy, when subdivided by specific sensitive variable subgroups. The reasons for these discrepancies are varied and not limited to the influence of mitigating variables, institutional bias, underlying population distributions as well as sampling bias. Among the numerous definitions of fairness that exist, we argue that at a minimum, principled ML practices should ensure that classification predictions are able to mirror the underlying sub-population distributions. However, as the number of sensitive variables increase, populations meeting at the intersectionality of these variables may simply not exist or may not be large enough to provide accurate samples for classification. In these increasingly likely scenarios, we make the case for human intervention and applying situational and individual definitions of fairness. In this paper we present lower bounds of subgroup sample complexity for metric-fair learning based on the theory of Probably Approximately Metric Fair Learning. We demonstrate that for a classifier to approach a definition of fairness in terms of specific sensitive variables, adequate subgroup population samples need to exist and the model dimensionality has to be aligned with subgroup population distributions. In cases where this is not feasible, we propose an approach using individual fairness definitions for achieving alignment. We look at two commonly explored UCI datasets under this lens and suggest human interventions for data collection for specific subgroups to achieve approximate individual fairness for linear hypotheses.

* Where is the Human? Bridging the Gap Between AI and HCI, CHI Workshop 2019 
Viaarxiv icon