Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Is Multilingual BERT Fluent in Language Generation?

Oct 09, 2019
Samuel Rönnqvist, Jenna Kanerva, Tapio Salakoski, Filip Ginter

The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences. We explore how well the model performs on several languages across several tasks: a diagnostic classification probing the embeddings for a particular syntactic property, a cloze task testing the language modelling ability to fill in gaps in a sentence, and a natural language generation task testing for the ability to produce coherent text fitting a given context. We find that the currently available multilingual BERT model is clearly inferior to the monolingual counterparts, and cannot in many cases serve as a substitute for a well-trained monolingual model. We find that the English and German models perform well at generation, whereas the multilingual model is lacking, in particular, for Nordic languages.

* In proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing (2019) 

  Access Paper or Ask Questions

Fine-Grained Analysis of Propaganda in News Articles

Oct 06, 2019
Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov

Propaganda aims at influencing people's mindset with the purpose of advancing a specific agenda. Previous work has addressed propaganda detection at the document level, typically labelling all articles from a propagandistic news outlet as propaganda. Such noisy gold labels inevitably affect the quality of any learning system trained on them. A further issue with most existing systems is the lack of explainability. To overcome these limitations, we propose a novel task: performing fine-grained analysis of texts by detecting all fragments that contain propaganda techniques as well as their type. In particular, we create a corpus of news articles manually annotated at the fragment level with eighteen propaganda techniques and we propose a suitable evaluation measure. We further design a novel multi-granularity neural network, and we show that it outperforms several strong BERT-based baselines.

* EMNLP-2019 

  Access Paper or Ask Questions

VisualBERT: A Simple and Performant Baseline for Vision and Language

Aug 09, 2019
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang

We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.

* Work in Progress 

  Access Paper or Ask Questions

Noise Contrastive Variational Autoencoders

Jul 31, 2019
Octavian-Eugen Ganea, Yashas Annadani, Gary Bécigneul

We take steps towards understanding the "posterior collapse (PC)" difficulty in variational autoencoders (VAEs),~i.e. a degenerate optimum in which the latent codes become independent of their corresponding inputs. We rely on calculus of variations and theoretically explore a few popular VAE models, showing that PC always occurs for non-parametric encoders and decoders. Inspired by the popular noise contrastive estimation algorithm, we propose NC-VAE where the encoder discriminates between the latent codes of real data and of some artificially generated noise, in addition to encouraging good data reconstruction abilities. Theoretically, we prove that our model cannot reach PC and provide novel lower bounds. Our method is straightforward to implement and has the same run-time as vanilla VAE. Empirically, we showcase its benefits on popular image and text datasets.

* There is a mistake common to all the main proofs. In summary, what we find are saddle points or global maxima of the respective loss functions and not the global minima. We apologize for this 

  Access Paper or Ask Questions

Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding

Jun 11, 2019
Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, Yang Song

Generating long and informative review text is a challenging natural language generation task. Previous work focuses on word-level generation, neglecting the importance of topical and syntactic characteristics from natural languages. In this paper, we propose a novel review generation model by characterizing an elaborately designed aspect-aware coarse-to-fine generation process. First, we model the aspect transitions to capture the overall content flow. Then, to generate a sentence, an aspect-aware sketch will be predicted using an aspect-aware decoder. Finally, another decoder fills in the semantic slots by generating corresponding words. Our approach is able to jointly utilize aspect semantics, syntactic sketch, and context information. Extensive experiments results have demonstrated the effectiveness of the proposed model.

* 11 pages, 2 figures, 7 tables 

  Access Paper or Ask Questions

Attention Is (not) All You Need for Commonsense Reasoning

May 31, 2019
Tassilo Klein, Moin Nabi

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

* to appear at ACL 2019 

  Access Paper or Ask Questions

Neural Review Rating Prediction with Hierarchical Attentions and Latent Factors

May 29, 2019
Xianchen Wang, Hongtao Liu, Peiyi Wang, Fangzhao Wu, Hongyan Xu, Wenjun Wang, Xing Xie

Text reviews can provide rich useful semantic information for modeling users and items, which can benefit rating prediction in recommendation. Different words and reviews may have different informativeness for users or items. Besides, different users and items should be personalized. Most existing works regard all reviews equally or utilize a general attention mechanism. In this paper, we propose a hierarchical attention model fusing latent factor model for rating prediction with reviews, which can focus on important words and informative reviews. Specially, we use the factor vectors of Latent Factor Model to guide the attention network and combine the factor vectors with feature representation learned from reviews to predict the final ratings. Experiments on real-world datasets validate the effectiveness of our approach.

* 4pages, 1 figures 

  Access Paper or Ask Questions

Show, Translate and Tell

Mar 14, 2019
Dheeraj Peri, Shagan Sah, Raymond Ptucha

Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the ability to understand and interpret this information. There is a need to both broaden their applicability and develop methods which correlate visual information along with semantic content. We propose a unified model which jointly trains on images and captions, and learns to generate new captions given either an image or a caption query. We evaluate our model on three different tasks namely cross-modal retrieval, image captioning, and sentence paraphrasing. Our model gains insight into cross-modal vector embeddings, generalizes well on multiple tasks and is competitive to state of the art methods on retrieval.

  Access Paper or Ask Questions

Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State

Mar 08, 2019
Richard Futrell, Ethan Wilcox, Takashi Morita, Peng Qian, Miguel Ballesteros, Roger Levy

We deploy the methods of controlled psycholinguistic experimentation to shed light on the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. To do so, we examine model behavior on artificial sentences containing a variety of syntactically complex structures. We test four models: two publicly available LSTM sequence models of English (Jozefowicz et al., 2016; Gulordava et al., 2018) trained on large datasets; an RNNG (Dyer et al., 2016) trained on a small, parsed dataset; and an LSTM trained on the same small corpus as the RNNG. We find evidence that the LSTMs trained on large datasets represent syntactic state over large spans of text in a way that is comparable to the RNNG, while the LSTM trained on the small dataset does not or does so only weakly.

* Accepted to NAACL 2019. Not yet edited into the camera-ready version 

  Access Paper or Ask Questions