Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders

May 25, 2018
Xiaopeng Yang, Xiaowen Lin, Shunda Suo, Ming Li

Computer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a novel conditional variational autoencoder with a hybrid decoder adding the deconvolutional neural networks to the general recurrent neural networks to fully learn topic information via latent variables. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. A proposed augmented word2vec model further improves the rhythm and symmetry. Tests show that the generated poems by our approach are mostly satisfying with regulated rules and consistent themes, and 73.42% of them receive an Overall score no less than 3 (the highest score is 5).

  Access Paper or Ask Questions

Dynamic Hierarchical Dirichlet Process for Abnormal Behaviour Detection in Video

Jun 27, 2016
Olga Isupova, Danil Kuzin, Lyudmila Mihaylova

This paper proposes a novel dynamic Hierarchical Dirichlet Process topic model that considers the dependence between successive observations. Conventional posterior inference algorithms for this kind of models require processing of the whole data through several passes. It is computationally intractable for massive or sequential data. We design the batch and online inference algorithms, based on the Gibbs sampling, for the proposed model. It allows to process sequential data, incrementally updating the model by a new observation. The model is applied to abnormal behaviour detection in video sequences. A new abnormality measure is proposed for decision making. The proposed method is compared with the method based on the non- dynamic Hierarchical Dirichlet Process, for which we also derive the online Gibbs sampler and the abnormality measure. The results with synthetic and real data show that the consideration of the dynamics in a topic model improves the classification performance for abnormal behaviour detection.

* 8 pages, International Conference on Information Fusion 2016 

  Access Paper or Ask Questions

Plan-And-Write: Towards Better Automatic Storytelling

Nov 20, 2018
Lili Yao, Nanyun Peng, Ralph Weischedel, Kevin Knight, Dongyan Zhao, Rui Yan

Automatic storytelling is challenging since it requires generating long, coherent natural language to describes a sensible sequence of events. Despite considerable efforts on automatic story generation in the past, prior work either is restricted in plot planning, or can only generate stories in a narrow domain. In this paper, we explore open-domain story generation that writes stories given a title (topic) as input. We propose a plan-and-write hierarchical generation framework that first plans a storyline, and then generates a story based on the storyline. We compare two planning strategies. The dynamic schema interweaves story planning and its surface realization in text, while the static schema plans out the entire storyline before generating stories. Experiments show that with explicit storyline planning, the generated stories are more diverse, coherent, and on topic than those generated without creating a full plan, according to both automatic and human evaluations.

* Accepted by AAAI 2019 

  Access Paper or Ask Questions

Effective extractive summarization using frequency-filtered entity relationship graphs

Oct 24, 2018
Archit Sakhadeo, Nisheeth Srivastava

Word frequency-based methods for extractive summarization are easy to implement and yield reasonable results across languages. However, they have significant limitations - they ignore the role of context, they offer uneven coverage of topics in a document, and sometimes are disjointed and hard to read. We use a simple premise from linguistic typology - that English sentences are complete descriptors of potential interactions between entities, usually in the order subject-verb-object - to address a subset of these difficulties. We have developed a hybrid model of extractive summarization that combines word-frequency based keyword identification with information from automatically generated entity relationship graphs to select sentences for summaries. Comparative evaluation with word-frequency and topic word-based methods shows that the proposed method is competitive by conventional ROUGE standards, and yields moderately more informative summaries on average, as assessed by a large panel (N=94) of human raters.

  Access Paper or Ask Questions

Assessing the impact of machine intelligence on human behaviour: an interdisciplinary endeavour

Jun 07, 2018
Emilia G贸mez, Carlos Castillo, Vicky Charisi, Ver贸nica Dahl, Gustavo Deco, Blagoj Delipetrev, Nicole Dewandre, Miguel 脕ngel Gonz谩lez-Ballester, Fabien Gouyon, Jos茅 Hern谩ndez-Orallo, Perfecto Herrera, Anders Jonsson, Ansgar Koene, Martha Larson, Ram贸n L贸pez de M谩ntaras, Bertin Martens, Marius Miron, Rub茅n Moreno-Bote, Nuria Oliver, Antonio Puertas Gallardo, Heike Schweitzer, Nuria Sebastian, Xavier Serra, Joan Serr脿, Song眉l Tolan, Karina Vold

This document contains the outcome of the first Human behaviour and machine intelligence (HUMAINT) workshop that took place 5-6 March 2018 in Barcelona, Spain. The workshop was organized in the context of a new research programme at the Centre for Advanced Studies, Joint Research Centre of the European Commission, which focuses on studying the potential impact of artificial intelligence on human behaviour. The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs. The document is made of short position statements and identification of challenges provided by each expert, and incorporates the result of the discussions carried out during the workshop. In the conclusion section, we provide a list of emerging research topics and strategies to be addressed in the near future.

* Proceedings of 1st HUMAINT (Human Behaviour and Machine Intelligence) workshop, Barcelona, Spain, March 5-6, 2018, edited by European Commission, Seville, 2018, JRC111773 arXiv admin note: text overlap with arXiv:1409.3097 by other authors 

  Access Paper or Ask Questions

On collapsed representation of hierarchical Completely Random Measures

Jun 02, 2016
Gaurav Pandey, Ambedkar Dukkipati

The aim of the paper is to provide an exact approach for generating a Poisson process sampled from a hierarchical CRM, without having to instantiate the infinitely many atoms of the random measures. We use completely random measures~(CRM) and hierarchical CRM to define a prior for Poisson processes. We derive the marginal distribution of the resultant point process, when the underlying CRM is marginalized out. Using well known properties unique to Poisson processes, we were able to derive an exact approach for instantiating a Poisson process with a hierarchical CRM prior. Furthermore, we derive Gibbs sampling strategies for hierarchical CRM models based on Chinese restaurant franchise sampling scheme. As an example, we present the sum of generalized gamma process (SGGP), and show its application in topic-modelling. We show that one can determine the power-law behaviour of the topics and words in a Bayesian fashion, by defining a prior on the parameters of SGGP.

* 11 pages, 1 figure 

  Access Paper or Ask Questions

An Empirical Study on Measuring the Similarity of Sentential Arguments with Language Model Domain Adaptation

Feb 19, 2021
ChaeHun Park, Sangwoo Seo

Measuring the similarity between two different sentential arguments is an important task in argument mining. However, one of the challenges in this field is that the dataset must be annotated using expertise in a variety of topics, making supervised learning with labeled data expensive. In this paper, we investigated whether this problem could be alleviated through transfer learning. We first adapted a pretrained language model to a domain of interest using self-supervised learning. Then, we fine-tuned the model to a task of measuring the similarity between sentences taken from different domains. Our approach improves a correlation with human-annotated similarity scores compared to competitive baseline models on the Argument Facet Similarity dataset in an unsupervised setting. Moreover, we achieve comparable performance to a fully supervised baseline model by using only about 60% of the labeled data samples. We believe that our work suggests the possibility of a generalized argument clustering model for various argumentative topics.

* 4+2 pages 

  Access Paper or Ask Questions

Gunrock 2.0: A User Adaptive Social Conversational System

Nov 30, 2020
Kaihui Liang, Austin Chau, Yu Li, Xueyuan Lu, Dian Yu, Mingyang Zhou, Ishan Jain, Sam Davidson, Josh Arnold, Minh Nguyen, Zhou Yu

Gunrock 2.0 is built on top of Gunrock with an emphasis on user adaptation. Gunrock 2.0 combines various neural natural language understanding modules, including named entity detection, linking, and dialog act prediction, to improve user understanding. Its dialog management is a hierarchical model that handles various topics, such as movies, music, and sports. The system-level dialog manager can handle question detection, acknowledgment, error handling, and additional functions, making downstream modules much easier to design and implement. The dialog manager also adapts its topic selection to accommodate different users' profile information, such as inferred gender and personality. The generation model is a mix of templates and neural generation models. Gunrock 2.0 is able to achieve an average rating of 3.73 at its latest build from May 29th to June 4th.

* Published in 3rd Proceedings of Alexa Prize (Alexa Prize 2020) 

  Access Paper or Ask Questions

Quality of Word Embeddings on Sentiment Analysis Tasks

Mar 06, 2020
Erion 脟ano, Maurizio Morisio

Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.

* 6 pages, 4 figures, 2 tables. Published in proceedings of NLDB 2017, the 22nd Conference of Natural Language Processing and Information Systems, Liege, Belgium 

  Access Paper or Ask Questions