Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes

Dec 18, 2020
Mana Iwata, Yoshiro Matsuda, Yoshimasa Utsumi, Yoshitoshi Tanaka, Kazuhide Nakata

Japanese patents are assigned a patent classification code, FI (File Index), that is unique to Japan. FI is a subdivision of the IPC, an international patent classification code, that is related to Japanese technology. FIs are revised to keep up with technological developments. These revisions have already established more than 30,000 new FIs since 2006. However, these revisions require a lot of time and workload. Moreover, these revisions are not automated and are thus inefficient. Therefore, using machine learning to assist in the revision of patent classification codes (FI) will lead to improved accuracy and efficiency. This study analyzes patent documents from this new perspective of assisting in the revision of patent classification codes with machine learning. To analyze time-series changes in patents, we used the dynamic topic model (DTM), which is an extension of the latent Dirichlet allocation (LDA). Also, unlike English, the Japanese language requires morphological analysis. Patents contain many technical words that are not used in everyday life, so morphological analysis using a common dictionary is not sufficient. Therefore, we used a technique for extracting technical terms from text. After extracting technical terms, we applied them to DTM. In this study, we determined the technological progress of the lighting class F21 for 14 years and compared it with the actual revision of patent classification codes. In other words, we extracted technical terms from Japanese patents and applied DTM to determine the progress of Japanese technology. Then, we analyzed the results from the new perspective of revising patent classification codes with machine learning. As a result, it was found that those whose topics were on the rise were judged to be new technologies.

  Access Paper or Ask Questions

Bayesian Allocation Model: Inference by Sequential Monte Carlo for Nonnegative Tensor Factorizations and Topic Models using Polya Urns

Mar 11, 2019
Ali Taylan Cemgil, Mehmet Burak Kurutmaz, Sinan Yildirim, Melih Barsbey, Umut Simsekli

We introduce a dynamic generative model, Bayesian allocation model (BAM), which establishes explicit connections between nonnegative tensor factorization (NTF), graphical models of discrete probability distributions and their Bayesian extensions, and the topic models such as the latent Dirichlet allocation. BAM is based on a Poisson process, whose events are marked by using a Bayesian network, where the conditional probability tables of this network are then integrated out analytically. We show that the resulting marginal process turns out to be a Polya urn, an integer valued self-reinforcing process. This urn processes, which we name a Polya-Bayes process, obey certain conditional independence properties that provide further insight about the nature of NTF. These insights also let us develop space efficient simulation algorithms that respect the potential sparsity of data: we propose a class of sequential importance sampling algorithms for computing NTF and approximating their marginal likelihood, which would be useful for model selection. The resulting methods can also be viewed as a model scoring method for topic models and discrete Bayesian networks with hidden variables. The new algorithms have favourable properties in the sparse data regime when contrasted with variational algorithms that become more accurate when the total sum of the elements of the observed tensor goes to infinity. We illustrate the performance on several examples and numerically study the behaviour of the algorithms for various data regimes.

* 70 pages, 16 figures 

  Access Paper or Ask Questions

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Dec 27, 2013
Jiwei Li, Sujian Li

Both supervised learning methods and LDA based topic model have been successfully applied in the field of query focused multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experiments on TAC2008 and TAC2009 demonstrate the effectiveness of our approach.

* This paper has been withdrawn by the author due to a crucial sign error in equation 

  Access Paper or Ask Questions

The Evolution of Popularity and Images of Characters in Marvel Cinematic Universe Fanfictions

May 10, 2018
Fan Bu

This analysis proposes a new topic model to study the yearly trends in Marvel Cinematic Universe fanfictions on three levels: character popularity, character images/topics, and vocabulary pattern of topics. It is found that character appearances in fanfictions have become more diverse over the years thanks to constant introduction of new characters in feature films, and in the case of Captain America, multi-dimensional character development is well-received by the fanfiction world.

  Access Paper or Ask Questions

Gibbs Sampling Strategies for Semantic Perception of Streaming Video Data

Sep 10, 2015
Yogesh Girdhar, Gregory Dudek

Topic modeling of streaming sensor data can be used for high level perception of the environment by a mobile robot. In this paper we compare various Gibbs sampling strategies for topic modeling of streaming spatiotemporal data, such as video captured by a mobile robot. Compared to previous work on online topic modeling, such as o-LDA and incremental LDA, we show that the proposed technique results in lower online and final perplexity, given the realtime constraints.

  Access Paper or Ask Questions

Alquist 2.0: Alexa Prize Socialbot Based on Sub-Dialogue Models

Nov 06, 2020
Jan Pichl, Petr Marek, Jakub Konrád, Martin Matulík, Jan Šedivý

This paper presents the second version of the dialogue system named Alquist competing in Amazon Alexa Prize 2018. We introduce a system leveraging ontology-based topic structure called topic nodes. Each of the nodes consists of several sub-dialogues, and each sub-dialogue has its own LSTM-based model for dialogue management. The sub-dialogues can be triggered according to the topic hierarchy or a user intent which allows the bot to create a unique experience during each session.

  Access Paper or Ask Questions

Adversarial Learning for Zero-Shot Stance Detection on Social Media

May 14, 2021
Emily Allaway, Malavika Srikanth, Kathleen McKeown

Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer.

* To appear in NAACL 2021 

  Access Paper or Ask Questions

Modeling Word Relatedness in Latent Dirichlet Allocation

Nov 10, 2014
Xun Wang

Standard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation (WR-LDA) by incorporating word correlation into LDA topic models. This leads to new capabilities that standard LDA model does not have such as estimating infrequently occurring words or multi-language topic modeling. Experimental results demonstrate the effectiveness of our model compared with standard LDA.

  Access Paper or Ask Questions

Plug-and-Blend: A Framework for Controllable Story Generation with Blended Control Codes

Mar 23, 2021
Zhiyu Lin, Mark Riedl

We describe a Plug-and-Play controllable language generation framework, Plug-and-Blend, that allows a human user to input multiple control codes (topics). In the context of automated story generation, this allows a human user lose or fine grained control of the topics that will appear in the generated story, and can even allow for overlapping, blended topics. We show that our framework, working with different generation models, controls the generation towards given continuous-weighted control codes while keeping the generated sentences fluent, demonstrating strong blending capability.

  Access Paper or Ask Questions

Extracting and categorising the reactions to COVID-19 by the South African public -- A social media study

Jun 11, 2020
Vukosi Marivate, Avashlin Moodley, Athandiwe Saba

Social Media can be used to extract discussion topics during a disaster. With the COVID-19 pandemic impact on South Africa, we need to understand how the law and regulation promulgated by the government in response to the pandemic contrasts with discussion topics social media users have been engaging in. In this work, we expand on traditional media analysis by using Social Media discussions driven by or directed to South African government officials. We find topics that are similar as well as different in some cases. The findings can inform further study into social media during disaster settings in South Africa and beyond.

* Under review for EMNLP 2020 

  Access Paper or Ask Questions