Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Learning to Describe Differences Between Pairs of Similar Images

Aug 31, 2018
Harsh Jhamtani, Taylor Berg-Kirkpatrick

In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. Annotators were asked to succinctly describe all the differences in a short paragraph. As a result, our novel dataset provides an opportunity to explore models that align language and vision, and capture visual salience. The dataset may also be a useful benchmark for coherent multi-sentence generation. We perform a firstpass visual analysis that exposes clusters of differing pixels as a proxy for object-level differences. We propose a model that captures visual salience by using a latent variable to align clusters of differing pixels with output sentences. We find that, for both single-sentence generation and as well as multi-sentence generation, the proposed model outperforms the models that use attention alone.

* EMNLP 2018 

  Access Paper or Ask Questions

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

Aug 30, 2018
Antonio Toral, Sheila Castilho, Ke Hu, Andy Way

We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT.

* WMT 2018 

  Access Paper or Ask Questions

Document Informed Neural Autoregressive Topic Models

Aug 11, 2018
Pankaj Gupta, Florian Buettner, Hinrich Schütze

Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. This results in an improved performance in terms of generalization, interpretability and applicability. We apply our modeling approach to seven data sets from various domains and demonstrate that our approach consistently outperforms stateof-the-art generative topic models. With the learned representations, we show on an average a gain of 9.6% (0.57 Vs 0.52) in precision at retrieval fraction 0.02 and 7.2% (0.582 Vs 0.543) in F1 for text categorization.

  Access Paper or Ask Questions

New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

Jul 13, 2018
Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society. Faced with these challenges, journalists are increasingly cooperating in international networks. To support such collaborations, we present the new version of new/s/leak 2.0, our open-source software for content-based searching of leaks. It includes three novel main features: 1) automatic language detection and language-dependent information extraction for 40 languages, 2) entity and keyword visualization for efficient exploration, and 3) decentral deployment for analysis of confidential data from various formats. We illustrate the new analysis capabilities with an exemplary case study.

* Social Informatics 2018 

  Access Paper or Ask Questions

Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks

Jun 22, 2018
Andreas Kölsch, Ashutosh Mishra, Saurabh Varshneya, Muhammad Zeshan Afzal, Marcus Liwicki

This paper introduces a very challenging dataset of historic German documents and evaluates Fully Convolutional Neural Network (FCNN) based methods to locate handwritten annotations of any kind in these documents. The handwritten annotations can appear in form of underlines and text by using various writing instruments, e.g., the use of pencils makes the data more challenging. We train and evaluate various end-to-end semantic segmentation approaches and report the results. The task is to classify the pixels of documents into two classes: background and handwritten annotation. The best model achieves a mean Intersection over Union (IoU) score of 95.6% on the test documents of the presented dataset. We also present a comparison of different strategies used for data augmentation and training on our presented dataset. For evaluation, we use the Layout Analysis Evaluator for the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts.

  Access Paper or Ask Questions

Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs

Jun 20, 2018
Vivian S. Silva, André Freitas, Siegfried Handschuh

Semantic annotation is fundamental to deal with large-scale lexical information, mapping the information to an enumerable set of categories over which rules and algorithms can be applied, and foundational ontology classes can be used as a formal set of categories for such tasks. A previous alignment between WordNet noun synsets and DOLCE provided a starting point for ontology-based annotation, but in NLP tasks verbs are also of substantial importance. This work presents an extension to the WordNet-DOLCE noun mapping, aligning verbs according to their links to nouns denoting perdurants, transferring to the verb the DOLCE class assigned to the noun that best represents that verb's occurrence. To evaluate the usefulness of this resource, we implemented a foundational ontology-based semantic annotation framework, that assigns a high-level foundational category to each word or phrase in a text, and compared it to a similar annotation tool, obtaining an increase of 9.05% in accuracy.

* Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management, Bologna, Italy, 2016, pp 593-605 
* 13 pages, 1 figure, presented at EKAW 2016 

  Access Paper or Ask Questions

Neural Trajectory Analysis of Recurrent Neural Network In Handwriting Synthesis

Apr 13, 2018
Kristof B. Charbonneau, Osamu Shouno

Recurrent neural networks (RNNs) are capable of learning to generate highly realistic, online handwritings in a wide variety of styles from a given text sequence. Furthermore, the networks can generate handwritings in the style of a particular writer when the network states are primed with a real sequence of pen movements from the writer. However, how populations of neurons in the RNN collectively achieve such performance still remains poorly understood. To tackle this problem, we investigated learned representations in RNNs by extracting low-dimensional, neural trajectories that summarize the activity of a population of neurons in the network during individual syntheses of handwritings. The neural trajectories show that different writing styles are encoded in different subspaces inside an internal space of the network. Within each subspace, different characters of the same style are represented as different state dynamics. These results demonstrate the effectiveness of analyzing the neural trajectory for intuitive understanding of how the RNNs work.

* 4 pages, 3 figures 

  Access Paper or Ask Questions

A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

Apr 09, 2018
Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross

The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for this task, including Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions are independent of further history. The separation between similarity computation and alignment decision also prevents end-to-end training. In this paper, we propose an end-to-end neural architecture where alignment actions are implemented as moving data between stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture supports a large variety of alignment tasks, including one-to-one, one-to-many, skipping unmatched elements, and (with extensions) non-monotonic alignment. Extensive experiments on semi-synthetic and real datasets show that our algorithm outperforms state-of-the-art baselines.

* Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and the supplemental material 

  Access Paper or Ask Questions

Visual Explanations from Hadamard Product in Multimodal Deep Networks

Dec 18, 2017
Jin-Hwa Kim, Byoung-Tak Zhang

The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model.

* 8 pages, 5 figures, including appendix, NIPS 2017 Workshop on Visually-Grounded Interaction and Language (ViGIL) 

  Access Paper or Ask Questions

Generative Interest Estimation for Document Recommendations

Nov 28, 2017
Danijar Hafner, Alexander Immer, Willi Raschkowski, Fabian Windheuser

Learning distributed representations of documents has pushed the state-of-the-art in several natural language processing tasks and was successfully applied to the field of recommender systems recently. In this paper, we propose a novel content-based recommender system based on learned representations and a generative model of user interest. Our method works as follows: First, we learn representations on a corpus of text documents. Then, we capture a user's interest as a generative model in the space of the document representations. In particular, we model the distribution of interest for each user as a Gaussian mixture model (GMM). Recommendations can be obtained directly by sampling from a user's generative model. Using Latent semantic analysis (LSA) as comparison, we compute and explore document representations on the Delicious bookmarks dataset, a standard benchmark for recommender systems. We then perform density estimation in both spaces and show that learned representations outperform LSA in terms of predictive performance.

  Access Paper or Ask Questions