Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Catching Out-of-Context Misinformation with Self-supervised Learning

Jan 27, 2021
Shivangi Aneja, Christoph Bregler, Matthias Nießner

Despite the recent attention to DeepFakes and other forms of image manipulations, one of the most prevalent ways to mislead audiences is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our core idea is a self-supervised training strategy where we only need images with matching (and non-matching) captions from different sources. At train time, our method learns to selectively align individual objects in an image with textual claims, without explicit supervision. At test time, we check for a given text pair if both texts correspond to same object(s) in the image but semantically convey different descriptions, which allows us to make fairly accurate out-of-context predictions. Our method achieves 82% out-of-context detection accuracy. To facilitate training our method, we created a large-scale dataset of 200K images which we match with 450K textual captions from a variety of news websites, blogs, and social media posts; i.e., for each image, we obtained several captions.

* Video : 

  Access Paper or Ask Questions

Towards Zero-shot Cross-lingual Image Retrieval

Nov 24, 2020
Pranav Aggarwal, Ajinkya Kale

There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side. We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during inference. We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts from each other. Finally, we introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform. We use this as the test set for evaluating zero-shot model performance across languages. XTD10 dataset is made publicly available here:

  Access Paper or Ask Questions

Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs

Jun 15, 2020
Dong Bok Lee, Seanie Lee, Woo Tae Jeong, Donghwan Kim, Sung Ju Hwang

One of the most crucial challenges in question answering (QA) is the scarcity of labeled data, since it is costly to obtain question-answer (QA) pairs for a target text domain with human annotation. An alternative approach to tackle the problem is to use automatically generated QA pairs from either the problem context or from large amount of unstructured texts (e.g. Wikipedia). In this work, we propose a hierarchical conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts, while maximizing the mutual information between generated QA pairs to ensure their consistency. We validate our Information Maximizing Hierarchical Conditional Variational AutoEncoder (Info-HCVAE) on several benchmark datasets by evaluating the performance of the QA model (BERT-base) using only the generated QA pairs (QA-based evaluation) or by using both the generated and human-labeled pairs (semi-supervised learning) for training, against state-of-the-art baseline models. The results show that our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.

* ACL 2020 

  Access Paper or Ask Questions

Towards Controllable Biases in Language Generation

May 01, 2020
Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng

We present a general approach towards controllable societal biases in natural language generation (NLG). Building upon the idea of adversarial triggers, we develop a method to induce or avoid biases in generated text containing mentions of specified demographic groups. We then analyze two scenarios: 1) inducing biases for one demographic and avoiding biases for another, and 2) mitigating biases between demographic pairs (e.g., man and woman). The former scenario gives us a tool for detecting the types of biases present in the model, and the latter is useful for mitigating biases in downstream applications (e.g., dialogue generation). Specifically, our approach facilitates more explainable biases by allowing us to 1) use the relative effectiveness of inducing biases for different demographics as a new dimension for bias evaluation, and 2) discover topics that correspond to demographic inequalities in generated text. Furthermore, our mitigation experiments exemplify our technique's effectiveness at equalizing the amount of biases across demographics while simultaneously generating less negatively biased text overall.

* 9 pages 

  Access Paper or Ask Questions

Contextualized Representations Using Textual Encyclopedic Knowledge

Apr 24, 2020
Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1).

  Access Paper or Ask Questions

Integrating Dictionary Feature into A Deep Learning Model for Disease Named Entity Recognition

Nov 05, 2019
Hamada A. Nayel, Shashrekha H. L

In recent years, Deep Learning (DL) models are becoming important due to their demonstrated success at overcoming complex learning problems. DL models have been applied effectively for different Natural Language Processing (NLP) tasks such as part-of-Speech (PoS) tagging and Machine Translation (MT). Disease Named Entity Recognition (Disease-NER) is a crucial task which aims at extracting disease Named Entities (NEs) from text. In this paper, a DL model for Disease-NER using dictionary information is proposed and evaluated on National Center for Biotechnology Information (NCBI) disease corpus and BC5CDR dataset. Word embeddings trained over general domain texts as well as biomedical texts have been used to represent input to the proposed model. This study also compares two different Segment Representation (SR) schemes, namely IOB2 and IOBES for Disease-NER. The results illustrate that using dictionary information, pre-trained word embeddings, character embeddings and CRF with global score improves the performance of Disease-NER system.

* 16 pages, 13 figures 

  Access Paper or Ask Questions

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

Aug 09, 2019
Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

We address the problem of cross-modal fine-grained action retrieval between text and video. Cross-modal retrieval is commonly achieved through learning a shared embedding space, that can indifferently embed modalities. In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions. We build a separate multi-modal embedding space for each PoS tag. The outputs of multiple PoS embeddings are then used as input to an integrated multi-modal space, where we perform action retrieval. All embeddings are trained jointly through a combination of PoS-aware and PoS-agnostic losses. Our proposal enables learning specialised embedding spaces that offer multiple views of the same embedded entities. We report the first retrieval results on fine-grained actions for the large-scale EPIC dataset, in a generalised zero-shot setting. Results show the advantage of our approach for both video-to-text and text-to-video action retrieval. We also demonstrate the benefit of disentangling the PoS for the generic task of cross-modal video retrieval on the MSR-VTT dataset.

* Accepted for presentation at ICCV. Project Page: 

  Access Paper or Ask Questions

Comparison of Classical Machine Learning Approaches on Bangla Textual Emotion Analysis

Jul 18, 2019
Md. Ataur Rahman, Md. Hanif Seddiqui

Detecting emotions from text is an extension of simple sentiment polarity detection. Instead of considering only positive or negative sentiments, emotions are conveyed using more tangible manner; thus, they can be expressed as many shades of gray. This paper manifests the results of our experimentation for fine-grained emotion analysis on Bangla text. We gathered and annotated a text corpus consisting of user comments from several Facebook groups regarding socio-economic and political issues, and we made efforts to extract the basic emotions (sadness, happiness, disgust, surprise, fear, anger) conveyed through these comments. Finally, we compared the results of the five most popular classical machine learning techniques namely Naive Bayes, Decision Tree, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and K-Means Clustering with several combinations of features. Our best model (SVM with a non-linear radial-basis function (RBF) kernel) achieved an overall average accuracy score of 52.98% and an F1 score (macro) of 0.3324

  Access Paper or Ask Questions

The Natural Auditor: How To Tell If Someone Used Your Words To Train Their Model

Nov 01, 2018
Congzheng Song, Vitaly Shmatikov

To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we propose a new \emph{model auditing} technique that enables users to check if their data was used to train a machine learning model. We focus on auditing deep-learning models that generate natural-language text, including word prediction and dialog generation. These models are at the core of many popular online services. Furthermore, they are often trained on very sensitive personal data, such as users' messages, searches, chats, and comments. We design and evaluate an effective black-box auditing method that can detect, with very few queries to a model, if a particular user's texts were used to train it (among thousands of other users). In contrast to prior work on membership inference against ML models, we do not assume that the model produces numeric confidence values. We empirically demonstrate that we can successfully audit models that are well-generalized and not overfitted to the training data. We also analyze how text-generation models memorize word sequences and explain why this memorization makes them amenable to auditing.

  Access Paper or Ask Questions