Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Q-learning with Language Model for Edit-based Unsupervised Summarization

Oct 09, 2020
Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

Unsupervised methods are promising for abstractive text summarization in that the parallel corpora is not required. However, their performance is still far from being satisfied, therefore research on promising solutions is on-going. In this paper, we propose a new approach based on Q-learning with an edit-based summarization. The method combines two key modules to form an Editorial Agent and Language Model converter (EALM). The agent predicts edit actions (e.t., delete, keep, and replace), and then the LM converter deterministically generates a summary on the basis of the action signals. Q-learning is leveraged to train the agent to produce proper edit actions. Experimental results show that EALM delivered competitive performance compared with the previous encoder-decoder-based methods, even with truly zero paired data (i.e., no validation set). Defining the task as Q-learning enables us not only to develop a competitive method but also to make the latest techniques in reinforcement learning available for unsupervised summarization. We also conduct qualitative analysis, providing insights into future study on unsupervised summarizers.

* 14 pages, 4 figures 

  Access Paper or Ask Questions

Neural Speech Synthesis for Estonian

Oct 06, 2020
Liisa Rätsep, Liisi Piits, Hille Pajupuu, Indrek Hein, Mark Fišel

This technical report describes the results of a collaboration between the NLP research group at the University of Tartu and the Institute of Estonian Language on improving neural speech synthesis for Estonian. The report (written in Estonian) describes the project results, the summary of which is: (1) Speech synthesis data from 6 speakers for a total of 92.4 hours is collected and openly released (CC-BY-4.0). Data available at and (2) software and models for neural speech synthesis is released open-source (MIT license). Available at . (3) We ran evaluations of the new models and compared them to other existing solutions (HMM-based HTS models from EKI,, and Google's speech synthesis for Estonian, accessed via Evaluation includes voice acceptability MOS scores for sentence-level and longer excerpts, detailed error analysis and evaluation of the pre-processing module.

* 9 pages in Estonian 

  Access Paper or Ask Questions

Grounded Compositional Outputs for Adaptive Language Modeling

Sep 24, 2020
Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith

Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's \emph{vocabulary}---typically selected before training and permanently fixed later---affects its size and is part of what makes it resistant to such adaptation. Prior work has used compositional input embeddings based on surface forms to ameliorate this issue. In this work, we go one step beyond and propose a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary. We evaluate the model on conventional language modeling as well as challenging cross-domain settings with an open vocabulary, finding that it matches or outperforms previous state-of-the-art output embedding methods and adaptation approaches. Our analysis attributes the improvements to sample efficiency: our model is more accurate for low-frequency words.

* EMNLP 2020 

  Access Paper or Ask Questions

A matrix concentration inequality for products

Aug 12, 2020
Sina Baghal

We present a non-asymptotic concentration inequality for the random matrix product \begin{equation}\label{eq:Zn} Z_n = \left(I_d-\alpha X_n\right)\left(I_d-\alpha X_{n-1}\right)\cdots \left(I_d-\alpha X_1\right), \end{equation} where $\left\{X_k \right\}_{k=1}^{+\infty}$ is a sequence of bounded independent random positive semidefinite matrices with common expectation $\mathbb{E}\left[X_k\right]=\Sigma$. Under these assumptions, we show that, for small enough positive $\alpha$, $Z_n$ satisfies the concentration inequality \begin{equation}\label{eq:CTbound} \mathbb{P}\left(\left\Vert Z_n-\mathbb{E}\left[Z_n\right]\right\Vert \geq t\right) \leq 2d^2\cdot\exp\left(\frac{-t^2}{\alpha v\left(\Sigma\right)} \right) \quad \text{for all } t\geq 0, \end{equation} where $v\left(\Sigma\right)$ denotes a variance parameter.

  Access Paper or Ask Questions

Multimodal Word Sense Disambiguation in Creative Practice

Jul 15, 2020
Manuel Ladron de Guevara, Christopher George, Akshat Gupta, Daragh Byrne, Ramesh Krishnamurti

Language is ambiguous; many terms and expressions can convey the same idea. This is especially true in creative practice, where ideas and design intents are highly subjective. We present a dataset, Ambiguous Descriptions of Art Images (ADARI), of contemporary workpieces, which aims to provide a foundational resource for subjective image description and multimodal word disambiguation in the context of creative practice. The dataset contains a total of 240k images labeled with 260k descriptive sentences. It is additionally organized into sub-domains of architecture, art, design, fashion, furniture, product design and technology. In subjective image description, labels are not deterministic: for example, the ambiguous label dynamic might correspond to hundreds of different images. To understand this complexity, we analyze the ambiguity and relevance of text with respect to images using the state-of-the-art pre-trained BERT model for sentence classification. We provide a baseline for multi-label classification tasks and demonstrate the potential of multimodal approaches for understanding ambiguity in design intentions. We hope that ADARI dataset and baselines constitute a first step towards subjective label classification.

* 9 pages, 5 figures, 2 tables 

  Access Paper or Ask Questions

Towards User Friendly Medication Mapping Using Entity-Boosted Two-Tower Neural Network

Jun 17, 2020
Shaoqing Yuan, Parminder Bhatia, Busra Celikkaya, Haiyang Liu, Kyunghwan Choi

Recent advancements in medical entity linking have been applied in the area of scientific literature and social media data. However, with the adoption of telemedicine and conversational agents such as Alexa in healthcare settings, medical name inference has become an important task. Medication name inference is the task of mapping user friendly medication names from a free-form text to a concept in a normalized medication list. This is challenging due to the differences in the use of medical terminology from health care professionals and user conversations coming from the lay public. We begin with mapping descriptive medication phrases (DMP) to standard medication names (SMN). Given the prescriptions of each patient, we want to provide them with the flexibility of referring to the medication in their preferred ways. We approach this as a ranking problem which maps SMN to DMP by ordering the list of medications in the patient's prescription list obtained from pharmacies. Furthermore, we leveraged the output of intermediate layers and performed medication clustering. We present the Medication Inference Model (MIM) achieving state-of-the-art results. By incorporating medical entities based attention, we have obtained further improvement for ranking models.

  Access Paper or Ask Questions

Video Understanding as Machine Translation

Jun 12, 2020
Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani

With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations. Most prior work formulates the objective as a contrastive metric learning problem between the modalities. To enable effective learning, however, these strategies require a careful selection of positive and negative samples often combined with hand-designed curriculum policies. In this work we remove the need for negative sampling by taking a generative modeling approach that poses the objective as a translation problem between modalities. Such a formulation allows us to tackle a wide variety of downstream video understanding tasks by means of a single unified framework, without the need for large batches of negative samples common in contrastive metric learning. We experiment with the large-scale HowTo100M dataset for training, and report performance gains over the state-of-the-art on several downstream tasks including video classification (EPIC-Kitchens), question answering (TVQA), captioning (TVC, YouCook2, and MSR-VTT), and text-based clip retrieval (YouCook2 and MSR-VTT).

  Access Paper or Ask Questions

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

May 01, 2020
Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen

In order to simulate human language capacity, natural language processing systems must complement the explicit information derived from raw text with the ability to reason about the possible causes and outcomes of everyday situations. Moreover, the acquired world knowledge should generalise to new languages, modulo cultural differences. Advances in machine commonsense reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We benchmark a range of state-of-the-art models on this novel dataset, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin. Finally, we propose ways to adapt these models to out-of-sample resource-lean languages where only a small corpus or a bilingual dictionary is available, and report substantial improvements over the random baseline. XCOPA is available at

  Access Paper or Ask Questions

Word Rotator's Distance: Decomposing Vectors Gives Better Representations

Apr 30, 2020
Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui

One key principle for assessing semantic similarity between texts is to measure the degree of semantic overlap of them by considering word-by-word alignment. However, alignment-based approaches} are inferior to the generic sentence vectors in terms of performance. We hypothesize that the reason for the inferiority of alignment-based methods is due to the fact that they do not distinguish word importance and word meaning. To solve this, we propose to separate word importance and word meaning by decomposing word vectors into their norm and direction, then compute the alignment-based similarity with the help of earth mover's distance. We call the method word rotator's distance (WRD) because direction vectors are aligned by rotation on the unit hypersphere. In addition, to incorporate the advance of cutting edge additive sentence encoders, we propose to re-decompose such sentence vectors into word vectors and use them as inputs to WRD. Empirically, the proposed method outperforms current methods considering the word-by-word alignment including word mover's distance with a big difference; moreover, our method outperforms state-of-the-art additive sentence encoders on the most competitive dataset, STS-benchmark.

  Access Paper or Ask Questions

Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models

Apr 28, 2020
Shilin He, Xing Wang, Shuming Shi, Michael R. Lyu, Zhaopeng Tu

Machine translation (MT) systems translate text between different languages by automatically learning in-depth knowledge of bilingual lexicons, grammar and semantics from the training examples. Although neural machine translation (NMT) has led the field of MT, we have a poor understanding on how and why it works. In this paper, we bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table -- an interpretable table of bilingual lexicons. We extract the phrase table from the training examples that an NMT model correctly predicts. Extensive experiments on widely-used datasets show that the phrase table is reasonable and consistent against language pairs and random seeds. Equipped with the interpretable phrase table, we find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples. We also revisit some advances that potentially affect the learning of bilingual knowledge (e.g., back-translation), and report some interesting findings. We believe this work opens a new angle to interpret NMT with statistic models, and provides empirical supports for recent advances in improving NMT models.

* 10 pages 

  Access Paper or Ask Questions