Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hamed Firooz

ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

May 25, 2022

Brihi Joshi, Aaron Chan, Ziyi Liu, Shaoliang Nie, Maziar Sanjabi, Hamed Firooz, Xiang Ren

Figure 1 for ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Figure 2 for ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Figure 3 for ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Figure 4 for ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Abstract:Neural language models' (NLMs') reasoning processes are notoriously hard to explain. Recently, there has been much progress in automatically generating machine rationales of NLM behavior, but less in utilizing the rationales to improve NLM behavior. For the latter, explanation regularization (ER) aims to improve NLM generalization by pushing the machine rationales to align with human rationales. Whereas prior works primarily evaluate such ER models via in-distribution (ID) generalization, ER's impact on out-of-distribution (OOD) is largely underexplored. Plus, little is understood about how ER model performance is affected by the choice of ER criteria or by the number/choice of training instances with human rationales. In light of this, we propose ER-TEST, a protocol for evaluating ER models' OOD generalization along three dimensions: (1) unseen datasets, (2) contrast set tests, and (3) functional tests. Using ER-TEST, we study three key questions: (A) Which ER criteria are most effective for the given OOD setting? (B) How is ER affected by the number/choice of training instances with human rationales? (C) Is ER effective with distantly supervised human rationales? ER-TEST enables comprehensive analysis of these questions by considering a diverse range of tasks and datasets. Through ER-TEST, we show that ER has little impact on ID performance, but can yield large gains on OOD performance w.r.t. (1)-(3). Also, we find that the best ER criterion is task-dependent, while ER can improve OOD performance even with limited and distantly-supervised human rationales.

* 19 pages, 10 figures

Via

Access Paper or Ask Questions

Detecting and Understanding Harmful Memes: A Survey

May 09, 2022

Shivam Sharma, Firoj Alam, Md. Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty

Figure 1 for Detecting and Understanding Harmful Memes: A Survey

Figure 2 for Detecting and Understanding Harmful Memes: A Survey

Figure 3 for Detecting and Understanding Harmful Memes: A Survey

Abstract:The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes, which are of particular interest due to their viral nature. With this in mind, here we offer a comprehensive survey with a focus on harmful memes. Based on a systematic analysis of recent literature, we first propose a new typology of harmful memes, and then we highlight and summarize the relevant state of the art. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism, partly due to the lack of suitable datasets. We further find that existing datasets mostly capture multi-class scenarios, which are not inclusive of the affective spectrum that memes can represent. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual, blending different cultures. We conclude by highlighting several challenges related to multimodal semiotics, technological constraints and non-trivial social engagement, and we present several open-ended aspects such as delineating online harm and empirically examining related frameworks and assistive interventions, which we believe will motivate and drive future research.

* Accepted at IJCAI-ECAI 2022 (Survey Track)

Via

Access Paper or Ask Questions

Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

Apr 12, 2022

Khalil Mrini, Shaoliang Nie, Jiatao Gu, Sinong Wang, Maziar Sanjabi, Hamed Firooz

Figure 1 for Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

Figure 2 for Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

Figure 3 for Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

Figure 4 for Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem

Abstract:We propose an autoregressive entity linking model, that is trained with two auxiliary tasks, and learns to re-rank generated samples at inference time. Our proposed novelties address two weaknesses in the literature. First, a recent method proposes to learn mention detection and then entity candidate selection, but relies on predefined sets of candidates. We use encoder-decoder autoregressive entity linking in order to bypass this need, and propose to train mention detection as an auxiliary task instead. Second, previous work suggests that re-ranking could help correct prediction errors. We add a new, auxiliary task, match prediction, to learn re-ranking. Without the use of a knowledge base or candidate sets, our model sets a new state of the art in two benchmark datasets of entity linking: COMETA in the biomedical domain, and AIDA-CoNLL in the news domain. We show through ablation studies that each of the two auxiliary tasks increases performance, and that re-ranking is an important factor to the increase. Finally, our low-resource experimental results suggest that performance on the main task benefits from the knowledge learned by the auxiliary tasks, and not just from the additional training data.

* Long paper accepted to ACL 2022 Findings

Via

Access Paper or Ask Questions

Understanding Failure Modes of Self-Supervised Learning

Mar 03, 2022

Neha Mukund Kalibhat, Kanika Narang, Liang Tan, Hamed Firooz, Maziar Sanjabi, Soheil Feizi

Figure 1 for Understanding Failure Modes of Self-Supervised Learning

Figure 2 for Understanding Failure Modes of Self-Supervised Learning

Figure 3 for Understanding Failure Modes of Self-Supervised Learning

Figure 4 for Understanding Failure Modes of Self-Supervised Learning

Abstract:Self-supervised learning methods have shown impressive results in downstream classification tasks. However, there is limited work in understanding their failure models and interpreting the learned representations of these models. In this paper, we tackle these issues and study the representation space of self-supervised models by understanding the underlying reasons for misclassifications in a downstream task. Over several state-of-the-art self-supervised models including SimCLR, SwaV, MoCo V2 and BYOL, we observe that representations of correctly classified samples have few discriminative features with highly deviated values compared to other features. This is in a clear contrast with representations of misclassified samples. We also observe that noisy features in the representation space often correspond to spurious attributes in images making the models less interpretable. Building on these observations, we propose a sample-wise Self-Supervised Representation Quality Score (or, Q-Score) that, without access to any label information, is able to predict if a given sample is likely to be misclassified in the downstream task, achieving an AUPRC of up to 0.90. Q-Score can also be used as a regularization to remedy low-quality representations leading to 3.26% relative improvement in accuracy of SimCLR on ImageNet-100. Moreover, we show that Q-Score regularization increases representation sparsity, thus reducing noise and improving interpretability through gradient heatmaps.

Via

Access Paper or Ask Questions

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Jan 06, 2022

Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz

Figure 1 for A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Figure 2 for A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Figure 3 for A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Figure 4 for A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Abstract:Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large training datasets, then the resulting aligned models perform well on zero-shot classification as downstream tasks2. In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models. Through extensive and careful experiments, we show that: 1) A simple Bag-of-Words (BoW) caption could be used as a replacement for most of the image captions in the dataset. Surprisingly, we observe that this approach improves the zero-shot classification performance when combined with word balancing. 2) Using a BoW pretrained model, we can obtain more training data by generating pseudo-BoW captions on images that do not have a caption. Models trained on images with real and pseudo-BoW captions achieve stronger zero-shot performance. On ImageNet-1k zero-shot evaluation, our best model, that uses only 3M image-caption pairs, performs on-par with a CLIP model trained on 15M image-caption pairs (31.5% vs 31.3%).

Via

Access Paper or Ask Questions

BARACK: Partially Supervised Group Robustness With Guarantees

Dec 31, 2021

Nimit Sohoni, Maziar Sanjabi, Nicolas Ballas, Aditya Grover, Shaoliang Nie, Hamed Firooz, Christopher Ré

Figure 1 for BARACK: Partially Supervised Group Robustness With Guarantees

Figure 2 for BARACK: Partially Supervised Group Robustness With Guarantees

Figure 3 for BARACK: Partially Supervised Group Robustness With Guarantees

Figure 4 for BARACK: Partially Supervised Group Robustness With Guarantees

Abstract:While neural networks have shown remarkable success on classification tasks in terms of average-case performance, they often fail to perform well on certain groups of the data. Such group information may be expensive to obtain; thus, recent works in robustness and fairness have proposed ways to improve worst-group performance even when group labels are unavailable for the training data. However, these methods generally underperform methods that utilize group information at training time. In this work, we assume access to a small number of group labels alongside a larger dataset without group labels. We propose BARACK, a simple two-step framework to utilize this partial group information to improve worst-group performance: train a model to predict the missing group labels for the training data, and then use these predicted group labels in a robust optimization objective. Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, showing how the generalization error scales with respect to both the total number of training points and the number of training points with group labels. Empirically, our method outperforms the baselines that do not use group information, even when only 1-33% of points have group labels. We provide ablation studies to support the robustness and extensibility of our framework.

* 26 pages

Via

Access Paper or Ask Questions

UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Dec 16, 2021

Aaron Chan, Maziar Sanjabi, Lambert Mathias, Liang Tan, Shaoliang Nie, Xiaochang Peng, Xiang Ren, Hamed Firooz

Figure 1 for UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Figure 2 for UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Figure 3 for UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Figure 4 for UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Abstract:An extractive rationale explains a language model's (LM's) prediction on a given task instance by highlighting the text inputs that most influenced the output. Ideally, rationale extraction should be faithful (reflects LM's behavior), plausible (makes sense to humans), data-efficient, and fast, without sacrificing the LM's task performance. Prior rationale extraction works consist of specialized approaches for addressing various subsets of these desiderata -- but never all five. Narrowly focusing on certain desiderata typically comes at the expense of ignored ones, so existing rationale extractors are often impractical in real-world applications. To tackle this challenge, we propose UniREx, a unified and highly flexible learning framework for rationale extraction, which allows users to easily account for all five factors. UniREx enables end-to-end customization of the rationale extractor training process, supporting arbitrary: (1) heuristic/learned rationale extractors, (2) combinations of faithfulness and/or plausibility objectives, and (3) amounts of gold rationale supervision. Across three text classification datasets, our best UniREx configurations achieve a superior balance of the five desiderata, when compared to strong baselines. Furthermore, UniREx-trained rationale extractors can even generalize to unseen datasets and tasks.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Apr 25, 2021

Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino

Figure 1 for SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Figure 2 for SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Figure 3 for SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Figure 4 for SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

Abstract:We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.e., both in the text and in the image. It was a popular task, attracting 71 registrations, and 22 teams that eventually made an official submission on the test set. The evaluation results for the third subtask confirmed the importance of both modalities, the text and the image. Moreover, some teams reported benefits when not just combining the two modalities, e.g., by using early or late fusion, but rather modeling the interaction between them in a joint model.

* SemEval-2021
* propaganda, disinformation, misinformation, fake news, memes, multimodality

Via

Access Paper or Ask Questions

A Survey on Multimodal Disinformation Detection

Mar 13, 2021

Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, Preslav Nakov

Figure 1 for A Survey on Multimodal Disinformation Detection

Abstract:Recent years have witnessed the proliferation of fake news, propaganda, misinformation, and disinformation online. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract much more attention, and spread further than simple text. As a result, researchers started targeting different modalities and combinations thereof. As different modalities are studied in different research communities, with insufficient interaction, here we offer a survey that explores the state-of-the-art on multimodal disinformation detection covering various combinations of modalities: text, images, audio, video, network structure, and temporal information. Moreover, while some studies focused on factuality, others investigated how harmful the content is. While these two components in the definition of disinformation -- (i) factuality and (ii) harmfulness, are equally important, they are typically studied in isolation. Thus, we argue for the need to tackle disinformation detection by taking into account multiple modalities as well as both factuality and harmfulness, in the same framework. Finally, we discuss current challenges and future research directions.

* disinformation, misinformation, factuality, harmfulness, fake news, propaganda, multimodality, text, images, videos, network structure, temporality

Via

Access Paper or Ask Questions

Modality-specific Distillation

Jan 06, 2021

Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren, Hamed Firooz

Figure 1 for Modality-specific Distillation

Figure 2 for Modality-specific Distillation

Figure 3 for Modality-specific Distillation

Figure 4 for Modality-specific Distillation

Abstract:Large neural networks are impractical to deploy on mobile devices due to their heavy computational cost and slow inference. Knowledge distillation (KD) is a technique to reduce the model size while retaining performance by transferring knowledge from a large "teacher" model to a smaller "student" model. However, KD on multimodal datasets such as vision-language datasets is relatively unexplored and digesting such multimodal information is challenging since different modalities present different types of information. In this paper, we propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets. Existing KD approaches can be applied to multimodal setup, but a student doesn't have access to modality-specific predictions. Our idea aims at mimicking a teacher's modality-specific predictions by introducing an auxiliary loss term for each modality. Because each modality has different importance for predictions, we also propose weighting approaches for the auxiliary losses; a meta-learning approach to learn the optimal weights on these loss terms. In our experiments, we demonstrate the effectiveness of our MSD and the weighting scheme and show that it achieves better performance than KD.

* Preprint

Via

Access Paper or Ask Questions