Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Towards Universal Semantic Tagging

Sep 29, 2017
Lasha Abzianidze, Johan Bos

The paper proposes the task of universal semantic tagging---tagging word tokens with language-neutral, semantically informative tags. We argue that the task, with its independent nature, contributes to better semantic analysis for wide-coverage multilingual text. We present the initial version of the semantic tagset and show that (a) the tags provide semantically fine-grained information, and (b) they are suitable for cross-lingual semantic parsing. An application of the semantic tagging in the Parallel Meaning Bank supports both of these points as the tags contribute to formal lexical semantics and their cross-lingual projection. As a part of the application, we annotate a small corpus with the semantic tags and present new baseline result for universal semantic tagging.

* 9 pages, International Conference on Computational Semantics (IWCS) 

  Access Paper or Ask Questions

Modelling Protagonist Goals and Desires in First-Person Narrative

Aug 29, 2017
Elahe Rahimtoroghi, Jiaqi Wu, Ruimin Wang, Pranav Anand, Marilyn A Walker

Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus.

* 10 pages, 18th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL 2017) 

  Access Paper or Ask Questions

Dynamic Bernoulli Embeddings for Language Evolution

Mar 23, 2017
Maja Rudolph, David Blei

Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. (2016) developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collections of historical texts: the U.S. Senate speeches from 1858 to 2009, the history of computer science ACM abstracts from 1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We find dynamic embeddings provide better fits than classical embeddings and capture interesting patterns about how language changes.

  Access Paper or Ask Questions

Using Graphs of Classifiers to Impose Declarative Constraints on Semi-supervised Learning

Mar 23, 2017
Lidong Bing, William W. Cohen, Bhuwan Dhingra

We propose a general approach to modeling semi-supervised learning (SSL) algorithms. Specifically, we present a declarative language for modeling both traditional supervised classification tasks and many SSL heuristics, including both well-known heuristics such as co-training and novel domain-specific heuristics. In addition to representing individual SSL heuristics, we show that multiple heuristics can be automatically combined using Bayesian optimization methods. We experiment with two classes of tasks, link-based text classification and relation extraction. We show modest improvements on well-studied link-based classification benchmarks, and state-of-the-art results on relation-extraction tasks for two realistic domains.

* 8 pages, 3 figures 

  Access Paper or Ask Questions

Query-Focused Opinion Summarization for User-Generated Content

Jun 17, 2016
Lu Wang, Hema Raghavan, Claire Cardie, Vittorio Castelli

We present a submodular function-based framework for query-focused opinion summarization. Within our framework, relevance ordering produced by a statistical ranker, and information coverage with respect to topic distribution and diverse viewpoints are both encoded as submodular functions. Dispersion functions are utilized to minimize the redundancy. We are the first to evaluate different metrics of text similarity for submodularity-based summarization methods. By experimenting on community QA and blog summarization, we show that our system outperforms state-of-the-art approaches in both automatic evaluation and human evaluation. A human evaluation task is conducted on Amazon Mechanical Turk with scale, and shows that our systems are able to generate summaries of high overall quality and information diversity.

* COLING 2014 

  Access Paper or Ask Questions

Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints

Jun 08, 2016
Greg Durrett, Taylor Berg-Kirkpatrick, Dan Klein

We present a discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints. Our model selects textual units to include in the summary based on a rich set of sparse features whose weights are learned on a large corpus. We allow for the deletion of content within a sentence when that deletion is licensed by compression rules; in our framework, these are implemented as dependencies between subsentential units of text. Anaphoricity constraints then improve cross-sentence coherence by guaranteeing that, for each pronoun included in the summary, the pronoun's antecedent is included as well or the pronoun is rewritten as a full mention. When trained end-to-end, our final system outperforms prior work on both ROUGE as well as on human judgments of linguistic quality.

* ACL 2016 

  Access Paper or Ask Questions

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

May 27, 2016
Bo Tang, Steven Kay, Haibo He, Paul M. Baggenstoss

In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that class-specific features can be used in the proposed classification method, instead of a common feature subset for all classes as used in conventional approaches. We apply the proposed EEF classifier for text categorization as a case study and derive an optimal Bayesian classification rule with class-specific feature selection based on the Information Gain (IG) score. The promising performance on real-life data sets demonstrates the effectiveness of the proposed approach and indicates its wide potential applications.

* 9 pages, 3 figures, to be published in IEEE Signal Processing Letter. IEEE Signal Processing Letter, 2016 

  Access Paper or Ask Questions

The Role of Pragmatics in Legal Norm Representation

Jul 08, 2015
Shashishekar Ramakrishna, Lukasz Gorski, Adrian Paschke

Despite the 'apparent clarity' of a given legal provision, its application may result in an outcome that does not exactly conform to the semantic level of a statute. The vagueness within a legal text is induced intentionally to accommodate all possible scenarios under which such norms should be applied, thus making the role of pragmatics an important aspect also in the representation of a legal norm and reasoning on top of it. The notion of pragmatics considered in this paper does not focus on the aspects associated with judicial decision making. The paper aims to shed light on the aspects of pragmatics in legal linguistics, mainly focusing on the domain of patent law, only from a knowledge representation perspective. The philosophical discussions presented in this paper are grounded based on the legal theories from Grice and Marmor.

* International Workshop On Legal Domain And Semantic Web Applications (LeDA-SWAn 2015), held during the 12th Extended Semantic Web Conference (ESWC 2015), June 1, 2015, Portoroz, Slovenia. in CEUR Workshop Proceedings 2015 

  Access Paper or Ask Questions

Modeling the average shortest path length in growth of word-adjacency networks

Mar 06, 2015
Andrzej Kulig, Stanislaw Drozdz, Jaroslaw Kwapien, Pawel Oswiecimka

We investigate properties of evolving linguistic networks defined by the word-adjacency relation. Such networks belong to the category of networks with accelerated growth but their shortest path length appears to reveal the network size dependence of different functional form than the ones known so far. We thus compare the networks created from literary texts with their artificial substitutes based on different variants of the Dorogovtsev-Mendes model and observe that none of them is able to properly simulate the novel asymptotics of the shortest path length. Then, we identify the local chain-like linear growth induced by grammar and style as a missing element in this model and extend it by incorporating such effects. It is in this way that a satisfactory agreement with the empirical result is obtained.

* Phys. Rev. E. 91, 032810 (2015) 
* Accepted for publication in Physical Review E 

  Access Paper or Ask Questions

A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions

Jan 25, 2014
Vivekananda Gayen, Kamal Sarkar

This paper presents a machine learning approach for identification of Bengali multiword expressions (MWE) which are bigram nominal compounds. Our proposed approach has two steps: (1) candidate extraction using chunk information and various heuristic rules and (2) training the machine learning algorithm called Random Forest to classify the candidates into two groups: bigram nominal compound MWE or not bigram nominal compound MWE. A variety of association measures, syntactic and linguistic clues and a set of WordNet-based similarity features have been used for our MWE identification task. The approach presented in this paper can be used to identify bigram nominal compound MWE in Bengali running text.

* In Proceedings of ICON-2013: 10th International Conference on Natural Language Processing, pp 290-296 

  Access Paper or Ask Questions