Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic Modeling": models, code, and papers

Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets

Sep 27, 2021
Zekeriya Anil Guven, Banu Diri, Tolgahan Cakaloglu

With the development of technology, the use of social media has become quite common. Analyzing comments on social media in areas such as media and advertising plays an important role today. For this reason, new and traditional natural language processing methods are used to detect the emotion of these shares. In this paper, the Latent Dirichlet Allocation, namely LDA, and Non-Negative Matrix Factorization methods in topic modeling were used to determine which emotion the Turkish tweets posted via Twitter. In addition, the accuracy of a proposed n-level method based on LDA was analyzed. Dataset consists of 5 emotions, namely angry, fear, happy, sad and confused. NMF was the most successful method among all topic modeling methods in this study. Then, the F1-measure of Random Forest, Naive Bayes and Support Vector Machine methods was analyzed by obtaining a file suitable for Weka by using the word weights and class labels of the topics. Among the Weka results, the most successful method was n-stage LDA, and the most successful algorithm was Random Forest.

* Innovations in Intelligent Systems and Applications Conference (ASYU), 2019, pp. 1-5 
* Published in: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU). This paper is extension version of Comparison Method for Emotion Detection of Twitter Users ( Please citation this IEEE paper 
Access Paper or Ask Questions

Belief-based Generation of Argumentative Claims

Jan 26, 2021
Milad Alshomary, Wei-Fan Chen, Timon Gurcke, Henning Wachsmuth

When engaging in argumentative discourse, skilled human debaters tailor claims to the beliefs of the audience, to construct effective arguments. Recently, the field of computational argumentation witnessed extensive effort to address the automatic generation of arguments. However, existing approaches do not perform any audience-specific adaptation. In this work, we aim to bridge this gap by studying the task of belief-based claim generation: Given a controversial topic and a set of beliefs, generate an argumentative claim tailored to the beliefs. To tackle this task, we model the people's prior beliefs through their stances on controversial topics and extend state-of-the-art text generation models to generate claims conditioned on the beliefs. Our automatic evaluation confirms the ability of our approach to adapt claims to a set of given beliefs. In a manual study, we additionally evaluate the generated claims in terms of informativeness and their likelihood to be uttered by someone with a respective belief. Our results reveal the limitations of modeling users' beliefs based on their stances, but demonstrate the potential of encoding beliefs into argumentative texts, laying the ground for future exploration of audience reach.

* Almost 9 pages, 1 figure, EACL-21 paper 
Access Paper or Ask Questions

Central Bank Communication and the Yield Curve: A Semi-Automatic Approach using Non-Negative Matrix Factorization

Sep 24, 2018
Ancil Crayton

Communication is now a standard tool in the central bank's monetary policy toolkit. Theoretically, communication provides the central bank an opportunity to guide public expectations, and it has been shown empirically that central bank communication can lead to financial market fluctuations. However, there has been little research into which dimensions or topics of information are most important in causing these fluctuations. We develop a semi-automatic methodology that summarizes the FOMC statements into its main themes, automatically selects the best model based on coherency, and assesses whether there is a significant impact of these themes on the shape of the U.S Treasury yield curve using topic modeling methods from the machine learning literature. Our findings suggest that the FOMC statements can be decomposed into three topics: (i) information related to the economic conditions and the mandates, (ii) information related to monetary policy tools and intermediate targets, and (iii) information related to financial markets and the financial crisis. We find that statements are most influential during the financial crisis and the effects are mostly present in the curvature of the yield curve through information related to the financial theme.

Access Paper or Ask Questions

Geometric Processing for Image-based 3D Object Modeling

Jun 27, 2021
Rongjun Qin, Xu Huang

Image-based 3D object modeling refers to the process of converting raw optical images to 3D digital representations of the objects. Very often, such models are desired to be dimensionally true, semantically labeled with photorealistic appearance (reality-based modeling). Laser scanning was deemed as the standard (and direct) way to obtaining highly accurate 3D measurements of objects, while one would have to abide the high acquisition cost and its unavailability on some of the platforms. Nowadays the image-based methods backboned by the recently developed advanced dense image matching algorithms and geo-referencing paradigms, are becoming the dominant approaches, due to its high flexibility, availability and low cost. The largely automated geometric processing of images in a 3D object reconstruction workflow, from ordered/unordered raw imagery to textured meshes, is becoming a critical part of the reality-based 3D modeling. This article summarizes the overall geometric processing workflow, with focuses on introducing the state-of-the-art methods of three major components of geometric processing: 1) geo-referencing; 2) Image dense matching 3) texture mapping. Finally, we will draw conclusions and share our outlooks of the topics discussed in this article.

Access Paper or Ask Questions

Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

May 23, 2019
Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber

To address the lack of comparative evaluation of Human-in-the-Loop Topic Modeling (HLTM) systems, we implement and evaluate three contrasting HLTM approaches using simulation experiments. These approaches are based on previously proposed frameworks, including constraints and informed prior-based methods. User control is desired, so we propose a control metric to measure whether refinement operations are applied as users expect. Informed prior-based methods provide better control than constraints, but constraints yield higher quality topics.

* Accepted at ACL 2019 
Access Paper or Ask Questions

Taste or Addiction?: Using Play Logs to Infer Song Selection Motivation

May 26, 2017
Kosetsu Tsukuda, Masataka Goto

Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to analyze people's music listening behavior. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling music listening behavior by taking into account the influence of addiction to artists. Second, by using real-world datasets of play logs, we showed the effectiveness of our proposed model. Third, we carried out qualitative experiments and showed that taking addiction into account enables us to analyze music listening behavior from a new viewpoint in terms of how people listen to music according to the time of day, how an artist's songs are listened to by people, etc. We also discuss the possibility of applying the analysis results to applications such as artist similarity computation and song recommendation.

* Accepted by The 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2017) 
Access Paper or Ask Questions

Science Checker: Extractive-Boolean Question Answering For Scientific Fact Checking

Apr 29, 2022
Loïc Rakotoson, Charles Letaillieur, Sylvain Massip, Fréjus Laleye

With the explosive growth of scientific publications, making the synthesis of scientific knowledge and fact checking becomes an increasingly complex task. In this paper, we propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles. We propose an intelligent combination of (1) an automatic information summarization and (2) a Boolean Question Answering which allows to generate an answer to a scientific question from only extracts obtained after summarization. Thus on a given topic, our proposed approach conducts structured content modeling based on paper abstracts to answer a scientific question while highlighting texts from paper that discuss the topic. We based our final system on an end-to-end Extractive Question Answering (EQA) combined with a three outputs classification model to perform in-depth semantic understanding of a question to illustrate the aggregation of multiple responses. With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%. Our results are supported via experiments with two QA models (BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and health domains on Europe PMC.

* Proceedings of the 1st International Workshop on Multimedia AI against Disinformation (2022) 
* 8 pages, 4 figures 
Access Paper or Ask Questions

Hotel Preference Rank based on Online Customer Review

Oct 10, 2021
Muhammad Apriandito Arya Saputra, Andry Alamsyah, Fajar Ibnu Fatihan

Topline hotels are now shifting into the digital way in how they understand their customers to maintain and ensuring satisfaction. Rather than the conventional way which uses written reviews or interviews, the hotel is now heavily investing in Artificial Intelligence particularly Machine Learning solutions. Analysis of online customer reviews changes the way companies make decisions in a more effective way than using conventional analysis. The purpose of this research is to measure hotel service quality. The proposed approach emphasizes service quality dimensions reviews of the top-5 luxury hotel in Indonesia that appear on the online travel site TripAdvisor based on section Best of 2018. In this research, we use a model based on a simple Bayesian classifier to classify each customer review into one of the service quality dimensions. Our model was able to separate each classification properly by accuracy, kappa, recall, precision, and F-measure measurements. To uncover latent topics in the customer's opinion we use Topic Modeling. We found that the common issue that occurs is about responsiveness as it got the lowest percentage compared to others. Our research provides a faster outlook of hotel rank based on service quality to end customers based on a summary of the previous online review.

* Test Engineering and Management, Vol. 83: March/April 2020 
* 5 pages, 6 figures, 5 tables 
Access Paper or Ask Questions

Topic Modeling Based Multi-modal Depression Detection

Mar 28, 2018
Yuan Gong, Christian Poellabauer

Major depressive disorder is a common mental disorder that affects almost 7% of the adult U.S. population. The 2017 Audio/Visual Emotion Challenge (AVEC) asks participants to build a model to predict depression levels based on the audio, video, and text of an interview ranging between 7-33 minutes. Since averaging features over the entire interview will lose most temporal information, how to discover, capture, and preserve useful temporal details for such a long interview are significant challenges. Therefore, we propose a novel topic modeling based approach to perform context-aware analysis of the recording. Our experiments show that the proposed approach outperforms context-unaware methods and the challenge baselines for all metrics.

* Proceedings of the 7th Audio/Visual Emotion Challenge and Workshop (AVEC). (Official Depression Challenge Winner) 
Access Paper or Ask Questions

Topic Detection and Tracking with Time-Aware Document Embeddings

Dec 12, 2021
Hang Jiang, Doug Beeferman, Weiquan Mao, Deb Roy

The time at which a message is communicated is a vital piece of metadata in many real-world natural language processing tasks such as Topic Detection and Tracking (TDT). TDT systems aim to cluster a corpus of news articles by event, and in that context, stories that describe the same event are likely to have been written at around the same time. Prior work on time modeling for TDT takes this into account, but does not well capture how time interacts with the semantic nature of the event. For example, stories about a tropical storm are likely to be written within a short time interval, while stories about a movie release may appear over weeks or months. In our work, we design a neural method that fuses temporal and textual information into a single representation of news documents for event detection. We fine-tune these time-aware document embeddings with a triplet loss architecture, integrate the model into downstream TDT systems, and evaluate the systems on two benchmark TDT data sets in English. In the retrospective setting, we apply clustering algorithms to the time-aware embeddings and show substantial improvements over baselines on the News2013 data set. In the online streaming setting, we add our document encoder to an existing state-of-the-art TDT pipeline and demonstrate that it can benefit the overall performance. We conduct ablation studies on the time representation and fusion algorithm strategies, showing that our proposed model outperforms alternative strategies. Finally, we probe the model to examine how it handles recurring events more effectively than previous TDT systems.

Access Paper or Ask Questions