Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Neural Speed Reading with Structural-Jump-LSTM

Apr 02, 2019
Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim over part of the input. We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference. The model consists of a standard LSTM and two agents: one capable of skipping single words when reading, and one capable of exploiting punctuation structure (sub-sentence separators (,:), sentence end symbols (.!?), or end of text markers) to jump ahead after reading a word. A comprehensive experimental evaluation of our model against all five state-of-the-art neural reading models shows that Structural-Jump-LSTM achieves the best overall floating point operations (FLOP) reduction (hence is faster), while keeping the same accuracy or even improving it compared to a vanilla LSTM that reads the whole text.

* 7th International Conference on Learning Representations (ICLR) 2019 
* 10 pages 

  Access Paper or Ask Questions

Automatic Generation of Natural Language Explanations

Jul 04, 2017
Felipe Costa, Sixun Ouyang, Peter Dolog, Aonghus Lawlor

An important task for recommender system is to generate explanations according to a user's preferences. Most of the current methods for explainable recommendations use structured sentences to provide descriptions along with the recommendations they produce. However, those methods have neglected the review-oriented way of writing a text, even though it is known that these reviews have a strong influence over user's decision. In this paper, we propose a method for the automatic generation of natural language explanations, for predicting how a user would write about an item, based on user ratings from different items' features. We design a character-level recurrent neural network (RNN) model, which generates an item's review explanations using long-short term memories (LSTM). The model generates text reviews given a combination of the review and ratings score that express opinions about different factors or aspects of an item. Our network is trained on a sub-sample from the large real-world dataset BeerAdvocate. Our empirical evaluation using natural language processing metrics shows the generated text's quality is close to a real user written review, identifying negation, misspellings, and domain specific vocabulary.

* 7 pages, 5 figures, 2nd workshop on Deep Learning for Recommender Systems 

  Access Paper or Ask Questions

WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Visual Information Extraction

Sep 27, 2016
Vijil Chenthamarakshan, Prasad M Desphande, Raghu Krishnapuram, Ramakrishna Varadarajan, Knut Stolze

The visual layout of a webpage can provide valuable clues for certain types of Information Extraction (IE) tasks. In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages. In contrast, we have developed a framework in which the rules can be specified directly at the layout level. This has many advantages, since the higher level of abstraction leads to simpler extraction rules that are largely independent of the source code of the page, and, therefore, more robust. It can also enable specification of new types of rules that are not otherwise possible. To the best of our knowledge, there is no general framework that allows declarative specification of information extraction rules based on spatial layout. Our framework is complementary to traditional text based rules framework and allows a seamless combination of spatial layout based rules with traditional text based rules. We describe the algebra that enables such a system and its efficient implementation using standard relational and text indexing features of a relational database. We demonstrate the simplicity and efficiency of this system for a task involving the extraction of software system requirements from software product pages.

  Access Paper or Ask Questions

Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

May 10, 2022
Jing Yang, Junwen Chen, Keiji Yanai

In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for Large Batch Training (TNLBT), which is inspired by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer). TNLBT aims to accomplish retrieval tasks while generating images from recipe embeddings. We apply the Hierarchical Transformer-based recipe text encoder, the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial network architecture to enable better cross-modal embedding learning for recipe texts and images. In addition, we use self-supervised learning to exploit the rich information in the recipe texts having no corresponding images. Since contrastive learning could benefit from a larger batch size according to the recent literature on self-supervised learning, we adopt a large batch size during training and have validated its effectiveness. In the experiments, the proposed framework significantly outperformed the current state-of-the-art frameworks in both cross-modal recipe retrieval and image generation tasks on the benchmark Recipe1M. This is the first work which confirmed the effectiveness of large batch training on cross-modal recipe embeddings.

* 13 pages, 8 figures 

  Access Paper or Ask Questions

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

Mar 10, 2022
Wei Li, Wenhao Wu, Moye Chen, Jiachen Liu, Xinyan Xiao, Hua Wu

Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models. This advancement has resulted in more fluent, coherent and even properties controllable (e.g. stylistic, sentiment, length etc.) generation, naturally leading to development in downstream tasks such as abstractive summarization, dialogue generation, machine translation, and data-to-text generation. However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge, which makes the performance of text generation unsatisfactory for practical applications in many real-world scenarios. Many studies on analysis, evaluation, and optimization methods for faithfulness problems have been proposed for various tasks, but have not been organized, compared and discussed in a combined manner. In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods. We organize the evaluation and optimization methods for different tasks into a unified taxonomy to facilitate comparison and learning across tasks. Several research trends are discussed further.

* The first version 

  Access Paper or Ask Questions

Lie-Sensor: A Live Emotion Verifier or a Licensor for Chat Applications using Emotional Intelligence

Feb 11, 2021
Falguni Patel, NirmalKumar Patel, Santosh Kumar Bharti

Veracity is an essential key in research and development of innovative products. Live Emotion analysis and verification nullify deceit made to complainers on live chat, corroborate messages of both ends in messaging apps and promote an honest conversation between users. The main concept behind this emotion artificial intelligent verifier is to license or decline message accountability by comparing variegated emotions of chat app users recognized through facial expressions and text prediction. In this paper, a proposed emotion intelligent live detector acts as an honest arbiter who distributes facial emotions into labels namely, Happiness, Sadness, Surprise, and Hate. Further, it separately predicts a label of messages through text classification. Finally, it compares both labels and declares the message as a fraud or a bonafide. For emotion detection, we deployed Convolutional Neural Network (CNN) using a miniXception model and for text prediction, we selected Support Vector Machine (SVM) natural language processing probability classifier due to receiving the best accuracy on training dataset after applying Support Vector Machine (SVM), Random Forest Classifier, Naive Bayes Classifier, and Logistic regression.

* 13 pages 

  Access Paper or Ask Questions

PatentMatch: A Dataset for Matching Patent Claims & Prior Art

Dec 27, 2020
Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel

Patent examiners need to solve a complex information retrieval task when they assess the novelty and inventive step of claims made in a patent application. Given a claim, they search for prior art, which comprises all relevant publicly available information. This time-consuming task requires a deep understanding of the respective technical domain and the patent-domain-specific language. For these reasons, we address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch. It contains pairs of claims from patent applications and semantically corresponding text passages of different degrees from cited patent documents. Each pair has been labeled by technically-skilled patent examiners from the European Patent Office. Accordingly, the label indicates the degree of semantic correspondence (matching), i.e., whether the text passage is prejudicial to the novelty of the claimed invention or not. Preliminary experiments using a baseline system show that PatentMatch can indeed be used for training a binary text pair classifier on this challenging information retrieval task. The dataset is available online:


  Access Paper or Ask Questions

The Go Transformer: Natural Language Modeling for Game Play

Jul 07, 2020
David Noever, Matthew Ciolino, Josh Kalin

This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project's (2020) game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6).

* 8 Pages, 5 Figures, 1 Table 

  Access Paper or Ask Questions

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Jul 09, 2019
Shuang Ma, Daniel McDuff, Yale Song

Generative adversarial networks have led to significant advances in cross-modal/domain translation. However, typically these networks are designed for a specific task (e.g., dialogue generation or image synthesis, but not both). We present a unified model, M3D-GAN, that can translate across a wide range of modalities (e.g., text, image, and speech) and domains (e.g., attributes in images or emotions in speech). Our model consists of modality subnets that convert data from different modalities into unified representations, and a unified computing body where data from different modalities share the same network architecture. We introduce a universal attention module that is jointly trained with the whole network and learns to encode a large range of domain information into a highly structured latent space. We use this to control synthesis in novel ways, such as producing diverse realistic pictures from a sketch or varying the emotion of synthesized speech. We evaluate our approach on extensive benchmark tasks, including image-to-image, text-to-image, image captioning, text-to-speech, speech recognition, and machine translation. Our results show state-of-the-art performance on some of the tasks.

  Access Paper or Ask Questions

Effective writing style imitation via combinatorial paraphrasing

May 31, 2019
Tommi Gröndahl, N. Asokan

Stylometry can be used to profile authors based on their written text. Transforming text to imitate someone else's writing style while retaining meaning constitutes a defence. A variety of deep learning methods for style imitation have been proposed in recent research literature. Via empirical evaluation of three state-of-the-art models on four datasets, we illustrate that none succeed in semantic retainment, often drastically changing the original meaning or removing important parts of the text. To mitigate this problem we present ParChoice: an alternative approach based on the combinatorial application of multiple paraphrasing techniques. ParChoice first produces a large number of possible candidate paraphrases, from which it then chooses the candidate that maximizes proximity to a target corpus. Through systematic automated and manual evaluation as well as a user study, we demonstrate that ParChoice significantly outperforms prior methods in its ability to retain semantic content. Using state-of-the art deep learning author profiling tools, we additionally show that ParChoice accomplishes better imitation success than A$^4$NT, the state-of-the-art style imitation technique with the best semantic retainment.

  Access Paper or Ask Questions