Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Basic tasks of sentiment analysis

Oct 18, 2017
Iti Chaturvedi, Soujanya Poria, Erik Cambria

Subjectivity detection is the task of identifying objective and subjective sentences. Objective sentences are those which do not exhibit any sentiment. So, it is desired for a sentiment analysis engine to find and separate the objective sentences for further analysis, e.g., polarity detection. In subjective sentences, opinions can often be expressed on one or multiple topics. Aspect extraction is a subtask of sentiment analysis that consists in identifying opinion targets in opinionated text, i.e., in detecting the specific aspects of a product or service the opinion holder is either praising or complaining about.

* Encyclopedia of Social Network Analysis and Mining, 2017 

  Access Paper or Ask Questions

SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

Apr 20, 2017
Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, Arkaitz Zubiaga

Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the kind of discourse there is around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics - each having their own families of claims and replies - and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.

  Access Paper or Ask Questions

Syntax-Semantics Interaction Parsing Strategies. Inside SYNTAGMA

Jan 21, 2016
Daniel Christen

This paper discusses SYNTAGMA, a rule based NLP system addressing the tricky issues of syntactic ambiguity reduction and word sense disambiguation as well as providing innovative and original solutions for constituent generation and constraints management. To provide an insight into how it operates, the system's general architecture and components, as well as its lexical, syntactic and semantic resources are described. After that, the paper addresses the mechanism that performs selective parsing through an interaction between syntactic and semantic information, leading the parser to a coherent and accurate interpretation of the input text.

  Access Paper or Ask Questions

SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

Jan 18, 2016
Heriberto Cuayáhuitl

This paper presents 'SimpleDS', a simple and publicly available dialogue system trained with deep reinforcement learning. In contrast to previous reinforcement learning dialogue systems, this system avoids manual feature engineering by performing action selection directly from raw text of the last system and (noisy) user responses. Our initial results, in the restaurant domain, show that it is indeed possible to induce reasonable dialogue behaviour with an approach that aims for high levels of automation in dialogue control for intelligent interactive agents.

* International Workshop on Spoken Dialogue Systems (IWSDS), 2016 

  Access Paper or Ask Questions

Towards The Development of a Bishnupriya Manipuri Corpus

Dec 11, 2013
Nayan Jyoti Kalita, Navanath Saharia, Smriti Kumar Sinha

For any deep computational processing of language we need evidences, and one such set of evidences is corpus. This paper describes the development of a text-based corpus for the Bishnupriya Manipuri language. A Corpus is considered as a building block for any language processing tasks. Due to the lack of awareness like other Indian languages, it is also studied less frequently. As a result the language still lacks a good corpus and basic language processing tools. As per our knowledge this is the first effort to develop a corpus for Bishnupriya Manipuri language.

* 5 pages, conference at National Conference on Recent Trends in Computer Sciences at Bodoland University, 25th-26th March, 2013 

  Access Paper or Ask Questions

Factorized Topic Models

Apr 23, 2013
Cheng Zhang, Carl Henrik Ek, Andreas Damianou, Hedvig Kjellstrom

In this paper we present a modification to a latent topic model, which makes the model exploit supervision to produce a factorized representation of the observed data. The structured parameterization separately encodes variance that is shared between classes from variance that is private to each class by the introduction of a new prior over the topic space. The approach allows for a more eff{}icient inference and provides an intuitive interpretation of the data in terms of an informative signal together with structured noise. The factorized representation is shown to enhance inference performance for image, text, and video classification.

* ICLR 2013 

  Access Paper or Ask Questions

Perspectives for Strong Artificial Life

Feb 13, 2005
J. -Ph Rennard

This text introduces the twin deadlocks of strong artificial life. Conceptualization of life is a deadlock both because of the existence of a continuum between the inert and the living, and because we only know one instance of life. Computationalism is a second deadlock since it remains a matter of faith. Nevertheless, artificial life realizations quickly progress and recent constructions embed an always growing set of the intuitive properties of life. This growing gap between theory and realizations should sooner or later crystallize in some kind of paradigm shift and then give clues to break the twin deadlocks.

* Rennard, J.-Ph., (2004), Perspective for Strong Artificial Life in De Castro, L.N. & von Zuben F.J. (Eds), Recent Developments in Biologically Inspired Computing, Hershey:IGP, 301-318 
* 19 pages, 5 figures 

  Access Paper or Ask Questions

Information Extraction - A User Guide

Feb 11, 1997
Hamish Cunningham

This technical memo describes Information Extraction from the point-of-view of a potential user of the technology. No knowledge of language processing is assumed. Information Extraction is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. This data may be used directly for display to users, or may be stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in Information Retrieval applications. See also

* LaTeX2e with PostScript figures, 17 pages (figures replaced with smaller versions) 

  Access Paper or Ask Questions

Learning Part-of-Speech Guessing Rules from Lexicon: Extension to Non-Concatenative Operations

Apr 30, 1996
Andrei Mikheev

One of the problems in part-of-speech tagging of real-word texts is that of unknown to the lexicon words. In Mikheev (ACL-96 cmp-lg/9604022), a technique for fully unsupervised statistical acquisition of rules which guess possible parts-of-speech for unknown words was proposed. One of the over-simplification assumed by this learning technique was the acquisition of morphological rules which obey only simple concatenative regularities of the main word with an affix. In this paper we extend this technique to the non-concatenative cases of suffixation and assess the gain in the performance.

* 6 pages, LaTeX (colap.sty for COLING-96); to appear in Proceedings of COLING-96 

  Access Paper or Ask Questions

Field Extraction from Forms with Unlabeled Data

Oct 08, 2021
Mingfei Gao, Zeyuan Chen, Nikhil Naik, Kazuma Hashimoto, Caiming Xiong, Ran Xu

We propose a novel framework to conduct field extraction from forms with unlabeled data. To bootstrap the training process, we develop a rule-based method for mining noisy pseudo-labels from unlabeled forms. Using the supervisory signal from the pseudo-labels, we extract a discriminative token representation from a transformer-based model by modeling the interaction between text in the form. To prevent the model from overfitting to label noise, we introduce a refinement module based on a progressive pseudo-label ensemble. Experimental results demonstrate the effectiveness of our framework.

  Access Paper or Ask Questions