Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Klout Topics for Modeling Interests and Expertise of Users Across Social Networks

Oct 26, 2017
Sarah Ellinger, Prantik Bhattacharyya, Preeti Bhargava, Nemanja Spasojevic

This paper presents Klout Topics, a lightweight ontology to describe social media users' topics of interest and expertise. Klout Topics is designed to: be human-readable and consumer-friendly; cover multiple domains of knowledge in depth; and promote data extensibility via knowledge base entities. We discuss why this ontology is well-suited for text labeling and interest modeling applications, and how it compares to available alternatives. We show its coverage against common social media interest sets, and examples of how it is used to model the interests of over 780M social media users on Finally, we open the ontology for external use.

* 4 pages, 2 figures, 5 tables 

  Access Paper or Ask Questions

The E2E Dataset: New Challenges For End-to-End Generation

Jul 06, 2017
Jekaterina Novikova, Ondřej Dušek, Verena Rieser

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.

* Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbr\"ucken, Germany, 15-17 August 2017 
* Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material) 

  Access Paper or Ask Questions

Semi-supervised Multitask Learning for Sequence Labeling

Apr 24, 2017
Marek Rei

We propose a sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset. This language modeling objective incentivises the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks. The architecture was evaluated on a range of datasets, covering the tasks of error detection in learner texts, named entity recognition, chunking and POS-tagging. The novel language modeling objective provided consistent performance improvements on every benchmark, without requiring any additional annotated or unannotated data.

* ACL 2017 

  Access Paper or Ask Questions

Alignment-based compositional semantics for instruction following

Apr 12, 2017
Jacob Andreas, Dan Klein

This paper describes an alignment-based model for interpreting natural language instructions in context. We approach instruction following as a search over plans, scoring sequences of actions conditioned on structured observations of text and the environment. By explicitly modeling both the low-level compositional structure of individual actions and the high-level structure of full plans, we are able to learn both grounded representations of sentence meaning and pragmatic constraints on interpretation. To demonstrate the model's flexibility, we apply it to a diverse set of benchmark tasks. On every task, we outperform strong task-specific baselines, and achieve several new state-of-the-art results.

* in proceedings of EMNLP 2015 

  Access Paper or Ask Questions

Author Identification using Multi-headed Recurrent Neural Networks

Aug 16, 2016
Douglas Bagnall

Recurrent neural networks (RNNs) are very good at modelling the flow of text, but typically need to be trained on a far larger corpus than is available for the PAN 2015 Author Identification task. This paper describes a novel approach where the output layer of a character-level RNN language model is split into several independent predictive sub-models, each representing an author, while the recurrent layer is shared by all. This allows the recurrent layer to model the language as a whole without over-fitting, while the outputs select aspects of the underlying model that reflect their author's style. The method proves competitive, ranking first in two of the four languages.

* 8 pages, 3 figures Version 1 was a notebook for the [email protected] Author Identification challenge. Version 2 is expanded to be a full paper for CLEF2016 

  Access Paper or Ask Questions

A Parallel Way to Select the Parameters of SVM Based on the Ant Optimization Algorithm

May 20, 2014
Chao Zhang, Hong-cen Mei, Hao Yang

A large number of experimental data shows that Support Vector Machine (SVM) algorithm has obvious advantages in text classification, handwriting recognition, image classification, bioinformatics, and some other fields. To some degree, the optimization of SVM depends on its kernel function and Slack variable, the determinant of which is its parameters $\delta$ and c in the classification function. That is to say,to optimize the SVM algorithm, the optimization of the two parameters play a huge role. Ant Colony Optimization (ACO) is optimization algorithm which simulate ants to find the optimal path.In the available literature, we mix the ACO algorithm and Parallel algorithm together to find a well parameters.

* 3 pages, 2 figures, 2 tables 

  Access Paper or Ask Questions

An Account of Opinion Implicatures

Apr 23, 2014
Janyce Wiebe, Lingjia Deng

While previous sentiment analysis research has concentrated on the interpretation of explicitly stated opinions and attitudes, this work initiates the computational study of a type of opinion implicature (i.e., opinion-oriented inference) in text. This paper described a rule-based framework for representing and analyzing opinion implicatures which we hope will contribute to deeper automatic interpretation of subjective language. In the course of understanding implicatures, the system recognizes implicit sentiments (and beliefs) toward various events and entities in the sentence, often attributed to different sources (holders) and of mixed polarities; thus, it produces a richer interpretation than is typical in opinion analysis.

* 50 Pages. Submitted to the journal, Language Resources and Evaluation 

  Access Paper or Ask Questions

Repairing and Inpainting Damaged Images using Diffusion Tensor

May 09, 2013
Faouzi Benzarti, Hamid Amiri

Removing or repairing the imperfections of a digital images or videos is a very active and attractive field of research belonging to the image inpainting technique. This later has a wide range of applications, such as removing scratches in old photographic image, removing text and logos or creating cartoon and artistic effects. In this paper, we propose an efficient method to repair a damaged image based on a non linear diffusion tensor. The idea is to track perfectly the local geometry of the damaged image and allowing diffusion only in the isophotes curves direction. To illustrate the effective performance of our method, we present some experimental results on test and real photographic color images

* IJCSI International Journal of Computer Science Issues, Vol 9, Issue 4, No 3, July 2012 ISSN 1694-0814 

  Access Paper or Ask Questions

Comment on "Language Trees and Zipping" arXiv:cond-mat/0108530

Mar 21, 2009
Xiuli Wang

Every encoding has priori information if the encoding represents any semantic information of the unverse or object. Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or denotation of the object. If encoding or strings of symbols is the adequate and true mapping of model or object, and the mapping is recursive or computable, the distance between two strings (text) is mapping the distance between models. We then are able to measure the distance by computing the distance between the two strings. Otherwise, we may take a misleading course. "Language tree" may not be a family tree in the sense of historical linguistics. Rather it just means the similarity.

* 2 pages comment on arXiv:cond-mat/0108530, arXiv:cond-mat/0203275 

  Access Paper or Ask Questions

Decision Lists for English and Basque

Apr 12, 2002
Eneko Agirre, David Martinez

In this paper we describe the systems we developed for the English (lexical and all-words) and Basque tasks. They were all supervised systems based on Yarowsky's Decision Lists. We used Semcor for training in the English all-words task. We defined different feature sets for each language. For Basque, in order to extract all the information from the text, we defined features that have not been used before in the literature, using a morphological analyzer. We also implemented systems that selected automatically good features and were able to obtain a prefixed precision (85%) at the cost of coverage. The systems that used all the features were identified as BCU-ehu-dlist-all and the systems that selected some features as BCU-ehu-dlist-best.

* Proceedings of the SENSEVAL-2 Workshop. In conjunction with ACL'2001/EACL'2001. Toulouse 
* 4 pages 

  Access Paper or Ask Questions