Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vivek Srikumar

A Closer Look at How Fine-tuning Changes BERT

Jun 27, 2021
Yichu Zhou, Vivek Srikumar

Figure 1 for A Closer Look at How Fine-tuning Changes BERT

Figure 2 for A Closer Look at How Fine-tuning Changes BERT

Figure 3 for A Closer Look at How Fine-tuning Changes BERT

Figure 4 for A Closer Look at How Fine-tuning Changes BERT

Given the prevalence of pre-trained contextualized representations in today's NLP, there have been several efforts to understand what information such representations contain. A common strategy to use such representations is to fine-tune them for an end task. However, how fine-tuning for a task changes the underlying space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. Our experiments reveal that fine-tuning improves performance because it pushes points associated with a label away from other labels. By comparing the representations before and after fine-tuning, we also discover that fine-tuning does not change the representations arbitrarily; instead, it adjusts the representations to downstream tasks while preserving the original structure. Finally, using carefully constructed experiments, we show that fine-tuning can encode training sets in a representation, suggesting an overfitting problem of a new kind.

Via

Access Paper or Ask Questions

X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

Jun 17, 2021
Ashim Gupta, Vivek Srikumar

Figure 1 for X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

Figure 2 for X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

Figure 3 for X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

Figure 4 for X-FACT: A New Benchmark Dataset for Multilingual Fact Checking

In this work, we introduce X-FACT: the largest publicly available multilingual dataset for factual verification of naturally existing real-world claims. The dataset contains short statements in 25 languages and is labeled for veracity by expert fact-checkers. The dataset includes a multilingual evaluation benchmark that measures both out-of-domain generalization, and zero-shot capabilities of the multilingual models. Using state-of-the-art multilingual transformer-based models, we develop several automated fact-checking models that, along with textual claims, make use of additional metadata and evidence from news stories retrieved using a search engine. Empirically, our best model attains an F-score of around 40%, suggesting that our dataset is a challenging benchmark for evaluation of multilingual fact-checking models.

* ACL 2021; For data and code, see https://github.com/utahnlp/x-fact/

Via

Access Paper or Ask Questions

Database Workload Characterization with Query Plan Encoders

May 26, 2021
Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar

Figure 1 for Database Workload Characterization with Query Plan Encoders

Figure 2 for Database Workload Characterization with Query Plan Encoders

Figure 3 for Database Workload Characterization with Query Plan Encoders

Figure 4 for Database Workload Characterization with Query Plan Encoders

Smart databases are adopting artificial intelligence (AI) technologies to achieve {\em instance optimality}, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization. To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the {\em structural} and the {\em computational performance} of queries independently. We show that our pretrained encoders are adaptable to workloads that expedite the transfer learning process. We performed independent assessments of structural encoder and performance encoders with multiple downstream tasks. For the overall evaluation of our query plan encoders, we architect two downstream tasks (i) query latency prediction and (ii) query classification. These tasks show the importance of feature-based workload characterization. We also performed extensive experiments on individual encoders to verify the effectiveness of representation learning and domain adaptability.

Via

Access Paper or Ask Questions

DirectProbe: Studying Representations without Classifiers

Apr 13, 2021
Yichu Zhou, Vivek Srikumar

Figure 1 for DirectProbe: Studying Representations without Classifiers

Figure 2 for DirectProbe: Studying Representations without Classifiers

Figure 3 for DirectProbe: Studying Representations without Classifiers

Figure 4 for DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized embedding could help explain their impressive performance across NLP@. Existing approaches for probing them usually call for training classifiers and use the accuracy, mutual information, or complexity as a proxy for the representation's goodness. In this work, we argue that doing so can be unreliable because different representations may need different classifiers. We develop a heuristic, DirectProbe, that directly studies the geometry of a representation by building upon the notion of a version space for a task. Experiments with several linguistic tasks and contextualized embeddings show that, even without training classifiers, DirectProbe can shine light into how an embedding space represents labels, and also anticipate classifier performance for the representation.

* NAACL 2021

Via

Access Paper or Ask Questions

Incorporating External Knowledge to Enhance Tabular Reasoning

Apr 09, 2021
J. Neeraja, Vivek Gupta, Vivek Srikumar

Figure 1 for Incorporating External Knowledge to Enhance Tabular Reasoning

Figure 2 for Incorporating External Knowledge to Enhance Tabular Reasoning

Figure 3 for Incorporating External Knowledge to Enhance Tabular Reasoning

Figure 4 for Incorporating External Knowledge to Enhance Tabular Reasoning

Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text. In this paper, we study these challenges through the problem of tabular natural language inference. We propose easy and effective modifications to how information is presented to a model for this task. We show via systematic experiments that these strategies substantially improve tabular inference performance.

* 11 pages, 1 Figure, 14 tables, To appear in NAACL 2021 (Short paper)

Via

Access Paper or Ask Questions

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Apr 06, 2021
Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei Zhang, Bei Wang

Figure 1 for VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Figure 2 for VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Figure 3 for VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Figure 4 for VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this paper, we utilize interactive visualization to increase the interpretability and accessibility of a collection of state-of-the-art debiasing techniques. To aid this, we present Visualization of Embedding Representations for deBiasing system ("VERB"), an open-source web-based visualization tool that helps the users gain a technical understanding and visual intuition of the inner workings of debiasing techniques, with a focus on their geometric properties. In particular, VERB offers easy-to-follow use cases in exploring the effects of these debiasing techniques on the geometry of high-dimensional word vectors. To help understand how various debiasing techniques change the underlying geometry, VERB decomposes each technique into interpretable sequences of primitive transformations and highlights their effect on the word vectors using dimensionality reduction and interactive visual exploration. VERB is designed to target natural language processing (NLP) practitioners who are designing decision-making systems on top of word embeddings, and also researchers working with fairness and ethics of machine learning systems in NLP. It can also serve as a visual medium for education, which helps an NLP novice to understand and mitigate biases in word embeddings.

* 11 pages

Via

Access Paper or Ask Questions

BERT & Family Eat Word Salad: Experiments with Text Understanding

Jan 10, 2021
Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar

Figure 1 for BERT & Family Eat Word Salad: Experiments with Text Understanding

Figure 2 for BERT & Family Eat Word Salad: Experiments with Text Understanding

Figure 3 for BERT & Family Eat Word Salad: Experiments with Text Understanding

Figure 4 for BERT & Family Eat Word Salad: Experiments with Text Understanding

In this paper, we study the response of large models from the BERT family to incoherent inputs that should confuse any model that claims to understand natural language. We define simple heuristics to construct such examples. Our experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high confidence predictions on them. Finally, we show that if models are explicitly trained to recognize invalid inputs, they can be robust to such attacks without a drop in performance.

* Accepted at AAAI 2021

Via

Access Paper or Ask Questions

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Dec 11, 2020
Jakob Prange, Nathan Schneider, Vivek Srikumar

Figure 1 for Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Figure 2 for Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Figure 3 for Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Figure 4 for Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories' internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

* Accepted to appear in TACL; Authors' final version, pre-MIT Press publication

Via

Access Paper or Ask Questions

UnQovering Stereotyping Biases via Underspecified Questions

Oct 10, 2020
Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek Srikumar

Figure 1 for UnQovering Stereotyping Biases via Underspecified Questions

Figure 2 for UnQovering Stereotyping Biases via Underspecified Questions

Figure 3 for UnQovering Stereotyping Biases via Underspecified Questions

Figure 4 for UnQovering Stereotyping Biases via Underspecified Questions

While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

* Accepted at Findings of EMNLP 2020

Via

Access Paper or Ask Questions