Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Learning to Select, Track, and Generate for Data-to-Text

Jul 23, 2019
Hayate Iso, Yui Uehara, Tatsuya Ishigaki, Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi, Yusuke Miyao, Naoaki Okazaki, Hiroya Takamura

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generation. Experimental results show that our model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

* ACL 2019 

  Access Paper or Ask Questions

SPECTRA: Sparse Structured Text Rationalization

Sep 09, 2021
Nuno Miguel Guerreiro, André F. T. Martins

Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring sampling-based gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.

* Accepted to EMNLP 2021 (main conference) 

  Access Paper or Ask Questions

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Apr 18, 2021
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, Yejin Choi

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in stark contrast to the reference-free manner in which humans assess caption quality. In this paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. Experiments spanning several corpora demonstrate that our new reference-free metric, CLIPScore, achieves the highest correlation with human judgements, outperforming existing reference-based metrics like CIDEr and SPICE. Information gain experiments demonstrate that CLIPScore, with its tight focus on image-text compatibility, is complementary to existing reference-based metrics that emphasize text-text similarities. Thus, we also present a reference-augmented version, RefCLIPScore, which achieves even higher correlation. Beyond literal description tasks, several case studies reveal domains where CLIPScore performs well (clip-art images, alt-text rating), but also where it is relatively weaker vs reference-based metrics, e.g., news captions that require richer contextual knowledge.

  Access Paper or Ask Questions

Part2Whole: Iteratively Enrich Detail for Cross-Modal Retrieval with Partial Query

Mar 02, 2021
Guanyu Cai, Xinyang Jiang, Jun Zhang, Yifei Gong, Lianghua He, Pai Peng, Xiaowei Guo, Xing Sun

Text-based image retrieval has seen considerable progress in recent years. However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of a complex scene, which often leads to results filled with false positives that fit the incomplete description. In this work, we introduce the partial-query problem and extensively analyze its influence on text-based image retrieval. We then propose an interactive retrieval framework called Part2Whole to tackle this problem by iteratively enriching the missing details. Specifically, an Interactive Retrieval Agent is trained to build an optimal policy to refine the initial query based on a user-friendly interaction and statistical characteristics of the gallery. Compared to other dialog-based methods that rely heavily on the user to feed back differentiating information, we let AI take over the optimal feedback searching process and hint the user with confirmation-based questions about details. Furthermore, since fully-supervised training is often infeasible due to the difficulty of obtaining human-machine dialog data, we present a weakly-supervised reinforcement learning method that needs no human-annotated data other than the text-image dataset. Experiments show that our framework significantly improves the performance of text-based image retrieval under complex scenes.

  Access Paper or Ask Questions

How to Train Your Agent to Read and Write

Jan 04, 2021
Li Liu, Mengge He, Guanghui Xu, Mingkui Tan, Qi Wu

Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (\eg{students}) to fully {grasp} this ability. It would be fascinating if we could train an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (\emph{i.e.}, reading) the knowledge in a given text or generating (\emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a \textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text \textit{Writer} that generates a novel paragraph, and a \textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at

  Access Paper or Ask Questions

Heterogeneous Graph Attention Networks for Early Detection of Rumors on Twitter

Jun 10, 2020
Qi Huang, Junshuai Yu, Jia Wu, Bin Wang

With the rapid development of mobile Internet technology and the widespread use of mobile devices, it becomes much easier for people to express their opinions on social media. The openness and convenience of social media platforms provide a free expression for people but also cause new social problems. The widespread of false rumors on social media can bring about the panic of the public and damage personal reputation, which makes rumor automatic detection technology become particularly necessary. The majority of existing methods for rumor detection focus on mining effective features from text contents, user profiles, and patterns of propagation. Nevertheless, these methods do not take full advantage of global semantic relations of the text contents, which characterize the semantic commonality of rumors as a key factor for detecting rumors. In this paper, we construct a tweet-word-user heterogeneous graph based on the text contents and the source tweet propagations of rumors. A meta-path based heterogeneous graph attention network framework is proposed to capture the global semantic relations of text contents, together with the global structure information of source tweet propagations for rumor detection. Experiments on real-world Twitter data demonstrate the superiority of the proposed approach, which also has a comparable ability to detect rumors at a very early stage.

* 8 pages 

  Access Paper or Ask Questions

Hidden-Markov-Model Based Speech Enhancement

Jul 04, 2017
Daniel Dzibela, Armin Sehr

The goal of this contribution is to use a parametric speech synthesis system for reducing background noise and other interferences from recorded speech signals. In a first step, Hidden Markov Models of the synthesis system are trained. Two adequate training corpora consisting of text and corresponding speech files have been set up and cleared of various faults, including inaudible utterances or incorrect assignments between audio and text data. Those are tested and compared against each other regarding e.g. flaws in the synthesized speech, it's naturalness and intelligibility. Thus different voices have been synthesized, whose quality depends less on the number of training samples used, but much more on the cleanliness and signal-to-noise ratio of those. Generalized voice models have been used for synthesis and the results greatly differ between the two speech corpora. Tests regarding the adaptation to different speakers show that a resemblance to the original speaker is audible throughout all recordings, yet the synthesized voices sound robotic and unnatural in smaller parts. The spoken text, however, is usually intelligible, which shows that the models are working well. In a novel approach, speech is synthesized using side information of the original audio signal, particularly the pitch frequency. Results show an increase of speech quality and intelligibility in comparison to speech synthesized solely from text, up to the point of being nearly indistinguishable from the original.

  Access Paper or Ask Questions

The Code2Text Challenge: Text Generation in Source Code Libraries

Jul 31, 2017
Kyle Richardson, Sina Zarrieß, Jonas Kuhn

We propose a new shared task for tactical data-to-text generation in the domain of source code libraries. Specifically, we focus on text generation of function descriptions from example software projects. Data is drawn from existing resources used for studying the related problem of semantic parser induction (Richardson and Kuhn, 2017b; Richardson and Kuhn, 2017a), and spans a wide variety of both natural languages and programming languages. In this paper, we describe these existing resources, which will serve as training and development data for the task, and discuss plans for building new independent test sets.

* Proceedings of INLG 2017, shared task track 

  Access Paper or Ask Questions

What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

Jan 25, 2018
Yosi Mass, Lili Kotlerman, Shachar Mirkin, Elad Venezian, Gera Witzling, Noam Slonim

We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe the benchmark, the process and the guidelines that were used to build it. We then demonstrate the results of a state-of-the-art system running on that benchmark.

  Access Paper or Ask Questions

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

Nov 29, 2016
Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate Saenko

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.

* Proc.EMNLP (2016) pg.1961-1966 
* Accepted at EMNLP 2016. Project page: 

  Access Paper or Ask Questions