Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

Scaling Federated Learning for Fine-tuning of Large Language Models

Feb 01, 2021
Agrin Hilmkil, Sebastian Callh, Matteo Barbieri, Leon René Sütfeld, Edvin Listo Zec, Olof Mogren

Federated learning (FL) is a promising approach to distributed compute, as well as distributed data, and provides a level of privacy and compliance to legal frameworks. This makes FL attractive for both consumer and healthcare applications. While the area is actively being explored, few studies have examined FL in the context of larger language models and there is a lack of comprehensive reviews of robustness across tasks, architectures, numbers of clients, and other relevant factors. In this paper, we explore the fine-tuning of Transformer-based language models in a federated learning setting. We evaluate three popular BERT-variants of different sizes (BERT, ALBERT, and DistilBERT) on a number of text classification tasks such as sentiment analysis and author identification. We perform an extensive sweep over the number of clients, ranging up to 32, to evaluate the impact of distributed compute on task performance in the federated averaging setting. While our findings suggest that the large sizes of the evaluated models are not generally prohibitive to federated training, we found that the different models handle federated averaging to a varying degree. Most notably, DistilBERT converges significantly slower with larger numbers of clients, and under some circumstances, even collapses to chance level performance. Investigating this issue presents an interesting perspective for future research.


  Access Paper or Ask Questions

LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

Apr 16, 2020
Dong-Ho Lee, Rahul Khanna, Bill Yuchen Lin, Jamin Chen, Seyeon Lee, Qinyuan Ye, Elizabeth Boschee, Leonardo Neves, Xiang Ren

Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.

* Accepted to the ACL 2020 (demo). The first two authors contributed equally. Project page: http://inklab.usc.edu/leanlife/ 

  Access Paper or Ask Questions

Evaluating NLP Models via Contrast Sets

Apr 06, 2020
Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou

Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we recommend that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Contrast sets provide a local view of a model's decision boundary, which can be used to more accurately evaluate a model's true linguistic capabilities. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets (e.g., DROP reading comprehension, UD parsing, IMDb sentiment analysis). Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets---up to 25\% in some cases. We release our contrast sets as new evaluation benchmarks and encourage future dataset construction efforts to follow similar annotation processes.


  Access Paper or Ask Questions

Artificial mental phenomena: Psychophysics as a framework to detect perception biases in AI models

Dec 15, 2019
Lizhen Liang, Daniel E. Acuna

Detecting biases in artificial intelligence has become difficult because of the impenetrable nature of deep learning. The central difficulty is in relating unobservable phenomena deep inside models with observable, outside quantities that we can measure from inputs and outputs. For example, can we detect gendered perceptions of occupations (e.g., female librarian, male electrician) using questions to and answers from a word embedding-based system? Current techniques for detecting biases are often customized for a task, dataset, or method, affecting their generalization. In this work, we draw from Psychophysics in Experimental Psychology---meant to relate quantities from the real world (i.e., "Physics") into subjective measures in the mind (i.e., "Psyche")---to propose an intellectually coherent and generalizable framework to detect biases in AI. Specifically, we adapt the two-alternative forced choice task (2AFC) to estimate potential biases and the strength of those biases in black-box models. We successfully reproduce previously-known biased perceptions in word embeddings and sentiment analysis predictions. We discuss how concepts in experimental psychology can be naturally applied to understanding artificial mental phenomena, and how psychophysics can form a useful methodological foundation to study fairness in AI.

* FAT Conference 2020 

  Access Paper or Ask Questions

Learning to Annotate: Modularizing Data Augmentation for Text Classifiers with Natural Language Explanations

Nov 07, 2019
Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

Deep neural networks usually require massive labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data over new instances, while the annotation time only doubles. However, directly applying them for augmenting model learning encounters two challenges: (1) NL explanations are unstructured and inherently compositional. (2) NL explanations often have large numbers of linguistic variants, resulting in low recall and limited generalization ability. In this paper, we propose a novel Neural EXecution Tree (NEXT) framework to augment training data for text classification using NL explanations. After transforming NL explanations into executable logical forms by semantic parsing, NEXT generalizes different types of actions specified by the logical forms for labeling data instances, which substantially increases the coverage of each NL explanation. Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort.

* 16 pages, 7 figures, 13 tables 

  Access Paper or Ask Questions

Learning to Annotate: Modularizing Data Augmentation for TextClassifiers with Natural Language Explanations

Nov 04, 2019
Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

Deep neural networks usually require massive labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data over new instances, while the annotation time only doubles. However, directly applying them for augmenting model learning encounters two challenges: (1) NL explanations are unstructured and inherently compositional. (2) NL explanations often have large numbers of linguistic variants, resulting in low recall and limited generalization ability. In this paper, we propose a novel Neural EXecution Tree (NEXT) framework to augment training data for text classification using NL explanations. After transforming NL explanations into executable logical forms by semantic parsing, NEXT generalizes different types of actions specified by the logical forms for labeling data instances, which substantially increases the coverage of each NL explanation. Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort.

* 16 pages, 7 figures, 13 tables 

  Access Paper or Ask Questions

Adapting Language Models for Non-Parallel Author-Stylized Rewriting

Sep 22, 2019
Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu N, Vasudeva Varma

Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-specific corpus using a denoising autoencoder (DAE) loss in a cascaded encoder-decoder framework. Optimizing over DAE loss allows our model to learn the nuances of an author's style without relying on parallel data, which has been a severe limitation of the previous related works in this space. To evaluate the efficacy of our approach, we propose a linguistically-motivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. The evaluation framework is both interpretable as it leads to several insights about the model, and self-contained as it does not rely on external classifiers, e.g. sentiment or formality classifiers. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines.

* 8 pages, 2 figures 

  Access Paper or Ask Questions

Exploring Fine-Tuned Embeddings that Model Intensifiers for Emotion Analysis

Apr 05, 2019
Laura Bostan, Roman Klinger

Adjective phrases like "a little bit surprised", "completely shocked", or "not stunned at all" are not handled properly by currently published state-of-the-art emotion classification and intensity prediction systems which use pre-dominantly non-contextualized word embeddings as input. Based on this finding, we analyze differences between embeddings used by these systems in regard to their capability of handling such cases. Furthermore, we argue that intensifiers in context of emotion words need special treatment, as is established for sentiment polarity classification, but not for more fine-grained emotion prediction. To resolve this issue, we analyze different aspects of a post-processing pipeline which enriches the word representations of such phrases. This includes expansion of semantic spaces at the phrase level and sub-word level followed by retrofitting to emotion lexica. We evaluate the impact of these steps with A La Carte and Bag-of-Substrings extensions based on pretrained GloVe, Word2vec, and fastText embeddings against a crowd-sourced corpus of intensity annotations for tweets containing our focus phrases. We show that the fastText-based models do not gain from handling these specific phrases under inspection. For Word2vec embeddings, we show that our post-processing pipeline improves the results by up to 8% on a novel dataset densely populated with intensifiers.

* Accepted at WASSA at NAACL 

  Access Paper or Ask Questions

Analyzing and learning the language for different types of harassment

Nov 01, 2018
Mohammadreza Rezvan, Saeedeh Shekarpour, Krishnaprasad Thirunarayan, Valerie L. Shalin, Amit Sheth

The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires that we can identify different forms or types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the context that represents the interrelated conditions in which they occur. In this paper, we introduce the notion of contextual type to harassment involving five categories: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. To study the context for each type that sheds light on the linguistic meaning, interpretation, and distribution, we conduct two lines of investigation: an extensive linguistic analysis, and a statistical distribution of unigrams. We then build type-ware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and major observations about the effectiveness of type-aware classifiers using a detailed comparison setup providing insight into the role of type-dependent features.

* Submitted for PLOS ONE Journal 17 pages, 14 Figures 

  Access Paper or Ask Questions

Decision Stream: Cultivating Deep Decision Trees

Sep 03, 2017
Dmitry Ignatov, Andrey Ignatov

Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%.


  Access Paper or Ask Questions

<<
227
228
229
230
231
232
233
234
235
236
237
238
239
>>