Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

MorphNet: A sequence-to-sequence model that combines morphological analysis and disambiguation

May 21, 2018
Erenay Dayanık, Ekin Akyürek, Deniz Yuret

We introduce MorphNet, a single model that combines morphological analysis and disambiguation. Traditionally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence-to-sequence recurrent neural network to combine analysis and disambiguation. We show that when trained with text labeled with correct morphological analyses, MorphNet obtains state-of-the art or comparable results for nine different datasets in seven different languages.

  Access Paper or Ask Questions

Generating syntactically varied realisations from AMR graphs

Apr 20, 2018
Kris Cao, Stephen Clark

Generating from Abstract Meaning Representation (AMR) is an underspecified problem, as many syntactic decisions are not specified by the semantic graph. We learn a sequence-to-sequence model that generates possible constituency trees for an AMR graph, and then train another model to generate text realisations conditioned on both an AMR graph and a constituency tree. We show that factorising the model this way lets us effectively use parse information, obtaining competitive BLEU scores on self-generated parses and impressive BLEU scores with oracle parses. We also demonstrate that we can generate meaning-preserving syntactic paraphrases of the same AMR graph.

  Access Paper or Ask Questions

ClaimRank: Detecting Check-Worthy Claims in Arabic and English

Apr 20, 2018
Israa Jaradat, Pepa Gencheva, Alberto Barron-Cedeno, Lluis Marquez, Preslav Nakov

We present ClaimRank, an online system for detecting check-worthy claims. While originally trained on political debates, the system can work for any kind of text, e.g., interviews or regular news articles. Its aim is to facilitate manual fact-checking efforts by prioritizing the claims that fact-checkers should consider first. ClaimRank supports both Arabic and English, it is trained on actual annotations from nine reputable fact-checking organizations (PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and Washington Post), and thus it can mimic the claim selection strategies for each and any of them, as well as for the union of them all.

* NAACL-2018 
* Check-worthiness; Fact-Checking; Veracity; Community-Question Answering; Neural Networks; Arabic; English 

  Access Paper or Ask Questions

Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension

Apr 01, 2018
Chia-Hsuan Li, Szu-Lin Wu, Chi-Liang Liu, Hung-yi Lee

Reading comprehension has been widely studied. One of the most representative reading comprehension tasks is Stanford Question Answering Dataset (SQuAD), on which machine is already comparable with human. On the other hand, accessing large collections of multimedia or spoken content is much more difficult and time-consuming than plain text content for humans. It's therefore highly attractive to develop machines which can automatically understand spoken content. In this paper, we propose a new listening comprehension task - Spoken SQuAD. On the new task, we found that speech recognition errors have catastrophic impact on machine comprehension, and several approaches are proposed to mitigate the impact.

  Access Paper or Ask Questions

Klout Topics for Modeling Interests and Expertise of Users Across Social Networks

Oct 26, 2017
Sarah Ellinger, Prantik Bhattacharyya, Preeti Bhargava, Nemanja Spasojevic

This paper presents Klout Topics, a lightweight ontology to describe social media users' topics of interest and expertise. Klout Topics is designed to: be human-readable and consumer-friendly; cover multiple domains of knowledge in depth; and promote data extensibility via knowledge base entities. We discuss why this ontology is well-suited for text labeling and interest modeling applications, and how it compares to available alternatives. We show its coverage against common social media interest sets, and examples of how it is used to model the interests of over 780M social media users on Finally, we open the ontology for external use.

* 4 pages, 2 figures, 5 tables 

  Access Paper or Ask Questions

The E2E Dataset: New Challenges For End-to-End Generation

Jul 06, 2017
Jekaterina Novikova, Ondřej Dušek, Verena Rieser

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.

* Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbr\"ucken, Germany, 15-17 August 2017 
* Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material) 

  Access Paper or Ask Questions

Semi-supervised Multitask Learning for Sequence Labeling

Apr 24, 2017
Marek Rei

We propose a sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset. This language modeling objective incentivises the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks. The architecture was evaluated on a range of datasets, covering the tasks of error detection in learner texts, named entity recognition, chunking and POS-tagging. The novel language modeling objective provided consistent performance improvements on every benchmark, without requiring any additional annotated or unannotated data.

* ACL 2017 

  Access Paper or Ask Questions

Alignment-based compositional semantics for instruction following

Apr 12, 2017
Jacob Andreas, Dan Klein

This paper describes an alignment-based model for interpreting natural language instructions in context. We approach instruction following as a search over plans, scoring sequences of actions conditioned on structured observations of text and the environment. By explicitly modeling both the low-level compositional structure of individual actions and the high-level structure of full plans, we are able to learn both grounded representations of sentence meaning and pragmatic constraints on interpretation. To demonstrate the model's flexibility, we apply it to a diverse set of benchmark tasks. On every task, we outperform strong task-specific baselines, and achieve several new state-of-the-art results.

* in proceedings of EMNLP 2015 

  Access Paper or Ask Questions

Author Identification using Multi-headed Recurrent Neural Networks

Aug 16, 2016
Douglas Bagnall

Recurrent neural networks (RNNs) are very good at modelling the flow of text, but typically need to be trained on a far larger corpus than is available for the PAN 2015 Author Identification task. This paper describes a novel approach where the output layer of a character-level RNN language model is split into several independent predictive sub-models, each representing an author, while the recurrent layer is shared by all. This allows the recurrent layer to model the language as a whole without over-fitting, while the outputs select aspects of the underlying model that reflect their author's style. The method proves competitive, ranking first in two of the four languages.

* 8 pages, 3 figures Version 1 was a notebook for the [email protected] Author Identification challenge. Version 2 is expanded to be a full paper for CLEF2016 

  Access Paper or Ask Questions

A Parallel Way to Select the Parameters of SVM Based on the Ant Optimization Algorithm

May 20, 2014
Chao Zhang, Hong-cen Mei, Hao Yang

A large number of experimental data shows that Support Vector Machine (SVM) algorithm has obvious advantages in text classification, handwriting recognition, image classification, bioinformatics, and some other fields. To some degree, the optimization of SVM depends on its kernel function and Slack variable, the determinant of which is its parameters $\delta$ and c in the classification function. That is to say,to optimize the SVM algorithm, the optimization of the two parameters play a huge role. Ant Colony Optimization (ACO) is optimization algorithm which simulate ants to find the optimal path.In the available literature, we mix the ACO algorithm and Parallel algorithm together to find a well parameters.

* 3 pages, 2 figures, 2 tables 

  Access Paper or Ask Questions