Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Towards Full-Fledged Argument Search: A Framework for Extracting and Clustering Arguments from Unstructured Text

Nov 30, 2021
Michael Färber, Anna Steyer

Argument search aims at identifying arguments in natural language texts. In the past, this task has been addressed by a combination of keyword search and argument identification on the sentence- or document-level. However, existing frameworks often address only specific components of argument search and do not address the following aspects: (1) argument-query matching: identifying arguments that frame the topic slightly differently than the actual search query; (2) argument identification: identifying arguments that consist of multiple sentences; (3) argument clustering: selecting retrieved arguments by topical aspects. In this paper, we propose a framework for addressing these shortcomings. We suggest (1) to combine the keyword search with precomputed topic clusters for argument-query matching, (2) to apply a novel approach based on sentence-level sequence-labeling for argument identification, and (3) to present aggregated arguments to users based on topic-aware argument clustering. Our experiments on several real-world debate data sets demonstrate that density-based clustering algorithms, such as HDBSCAN, are particularly suitable for argument-query matching. With our sentence-level, BiLSTM-based sequence-labeling approach we achieve a macro F1 score of 0.71. Finally, evaluating our argument clustering method indicates that a fine-grained clustering of arguments by subtopics remains challenging but is worthwhile to be explored.

  Access Paper or Ask Questions

MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

Apr 30, 2020
Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris

Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities. The purpose of MuSe 2020 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), and the sentiment analysis community (symbol-based). We present three distinct sub-challenges: MuSe-Wild, which focuses on continuous emotion (arousal and valence) prediction; MuSe-Topic, in which participants recognise domain-specific topics as the target of 3-class (low, medium, high) emotions; and MuSe-Trust, in which the novel aspect of trustworthiness is to be predicted. In this paper, we provide detailed information on MuSe-CaR, the first of its kind in-the-wild database, which is utilised for the challenge, as well as the state-of-the-art features and modelling approaches applied. For each sub-challenge, a competitive baseline for participants is set; namely, on test we report for MuSe-Wild a combined (valence and arousal) CCC of .2568, for MuSe-Topic a score (computed as 0.34$\cdot$ UAR + 0.66$\cdot$F1) of 76.78 % on the 10-class topic and 40.64 % on the 3-class emotion prediction, and for MuSe-Trust a CCC of .4359.

* Baseline Paper MuSe 2020, MuSe Workshop Challenge, ACM Multimedia 

  Access Paper or Ask Questions

A New Geometric Approach to Latent Topic Modeling and Discovery

Jan 05, 2013
Weicong Ding, Mohammad H. Rohban, Prakash Ishwar, Venkatesh Saligrama

A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.

* This paper was submitted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 on November 30, 2012 

  Access Paper or Ask Questions

Emotion-based Modeling of Mental Disorders on Social Media

Jan 24, 2022
Xiaobo Guo, Yaojia Sun, Soroush Vosoughi

According to the World Health Organization (WHO), one in four people will be affected by mental disorders at some point in their lives. However, in many parts of the world, patients do not actively seek professional diagnosis because of stigma attached to mental illness, ignorance of mental health and its associated symptoms. In this paper, we propose a model for passively detecting mental disorders using conversations on Reddit. Specifically, we focus on a subset of mental disorders that are characterized by distinct emotional patterns (henceforth called emotional disorders): major depressive, anxiety, and bipolar disorders. Through passive (i.e., unprompted) detection, we can encourage patients to seek diagnosis and treatment for mental disorders. Our proposed model is different from other work in this area in that our model is based entirely on the emotional states, and the transition between these states of users on Reddit, whereas prior work is typically based on content-based representations (e.g., n-grams, language model embeddings, etc). We show that content-based representation is affected by domain and topic bias and thus does not generalize, while our model, on the other hand, suppresses topic-specific information and thus generalizes well across different topics and times. We conduct experiments on our model's ability to detect different emotional disorders and on the generalizability of our model. Our experiments show that while our model performs comparably to content-based models, such as BERT, it generalizes much better across time and topic.

* Proceedings of the 20th IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 

  Access Paper or Ask Questions

Topology Analysis of International Networks Based on Debates in the United Nations

Jul 29, 2017
Stefano Gurciullo, Slava Mikhaylov

In complex, high dimensional and unstructured data it is often difficult to extract meaningful patterns. This is especially the case when dealing with textual data. Recent studies in machine learning, information theory and network science have developed several novel instruments to extract the semantics of unstructured data, and harness it to build a network of relations. Such approaches serve as an efficient tool for dimensionality reduction and pattern detection. This paper applies semantic network science to extract ideological proximity in the international arena, by focusing on the data from General Debates in the UN General Assembly on the topics of high salience to international community. UN General Debate corpus (UNGDC) covers all high-level debates in the UN General Assembly from 1970 to 2014, covering all UN member states. The research proceeds in three main steps. First, Latent Dirichlet Allocation (LDA) is used to extract the topics of the UN speeches, and therefore semantic information. Each country is then assigned a vector specifying the exposure to each of the topics identified. This intermediate output is then used in to construct a network of countries based on information theoretical metrics where the links capture similar vectorial patterns in the topic distributions. Topology of the networks is then analyzed through network properties like density, path length and clustering. Finally, we identify specific topological features of our networks using the map equation framework to detect communities in our networks of countries.

  Access Paper or Ask Questions

Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models

Jun 04, 2019
Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber

To address the lack of comparative evaluation of Human-in-the-Loop Topic Modeling (HLTM) systems, we implement and evaluate three contrasting HLTM modeling approaches using simulation experiments. These approaches extend previously proposed frameworks, including constraints and informed prior-based methods. Users should have a sense of control in HLTM systems, so we propose a control metric to measure whether refinement operations' results match users' expectations. Informed prior-based methods provide better control than constraints, but constraints yield higher quality topics.

* In proceedings of ACL 2019 

  Access Paper or Ask Questions

A Weakly Supervised Approach for Classifying Stance in Twitter Replies

Mar 12, 2021
Sumeet Kumar, Ramon Villa Cox, Matthew Babcock, Kathleen M. Carley

Conversations on social media (SM) are increasingly being used to investigate social issues on the web, such as online harassment and rumor spread. For such issues, a common thread of research uses adversarial reactions, e.g., replies pointing out factual inaccuracies in rumors. Though adversarial reactions are prevalent in online conversations, inferring those adverse views (or stance) from the text in replies is difficult and requires complex natural language processing (NLP) models. Moreover, conventional NLP models for stance mining need labeled data for supervised learning. Getting labeled conversations can itself be challenging as conversations can be on any topic, and topics change over time. These challenges make learning the stance a difficult NLP problem. In this research, we first create a new stance dataset comprised of three different topics by labeling both users' opinions on the topics (as in pro/con) and users' stance while replying to others' posts (as in favor/oppose). As we find limitations with supervised approaches, we propose a weakly-supervised approach to predict the stance in Twitter replies. Our novel method allows using a smaller number of hashtags to generate weak labels for Twitter replies. Compared to supervised learning, our method improves the mean F1-macro by 8\% on the hand-labeled dataset without using any hand-labeled examples in the training set. We further show the applicability of our proposed method on COVID 19 related conversations on Twitter.

  Access Paper or Ask Questions

Detecting "Smart" Spammers On Social Network: A Topic Model Approach

Jun 09, 2016
Linqing Liu, Yao Lu, Ye Luo, Renxian Zhang, Laurent Itti, Jianwei Lu

Spammer detection on social network is a challenging problem. The rigid anti-spam rules have resulted in emergence of "smart" spammers. They resemble legitimate users who are difficult to identify. In this paper, we present a novel spammer classification approach based on Latent Dirichlet Allocation(LDA), a topic model. Our approach extracts both the local and the global information of topic distribution patterns, which capture the essence of spamming. Tested on one benchmark dataset and one self-collected dataset, our proposed method outperforms other state-of-the-art methods in terms of averaged F1-score.

* NAACL-HLT 2016, Student Research Workshop 

  Access Paper or Ask Questions

Japanese Discourse and the Process of Centering

Sep 24, 1996
Marilyn Walker, Masayo Iida, Sharon Cote

This paper has three aims: (1) to generalize a computational account of the discourse process called {\sc centering}, (2) to apply this account to discourse processing in Japanese so that it can be used in computational systems for machine translation or language understanding, and (3) to provide some insights on the effect of syntactic factors in Japanese on discourse interpretation. We argue that while discourse interpretation is an inferential process, syntactic cues constrain this process, and demonstrate this argument with respect to the interpretation of {\sc zeros}, unexpressed arguments of the verb, in Japanese. The syntactic cues in Japanese discourse that we investigate are the morphological markers for grammatical {\sc topic}, the postposition {\it wa}, as well as those for grammatical functions such as {\sc subject}, {\em ga}, {\sc object}, {\em o} and {\sc object2}, {\em ni}. In addition, we investigate the role of speaker's {\sc empathy}, which is the viewpoint from which an event is described. This is syntactically indicated through the use of verbal compounding, i.e. the auxiliary use of verbs such as {\it kureta, kita}. Our results are based on a survey of native speakers of their interpretation of short discourses, consisting of minimal pairs, varied by one of the above factors. We demonstrate that these syntactic cues do indeed affect the interpretation of {\sc zeros}, but that having previously been the {\sc topic} and being realized as a {\sc zero} also contributes to the salience of a discourse entity. We propose a discourse rule of {\sc zero topic assignment}, and show that {\sc centering} provides constraints on when a {\sc zero} can be interpreted as the {\sc zero topic}.

* Computational Linguistics 20-2, 1994 
* 38 pages, uses clstyle, lingmacros 

  Access Paper or Ask Questions

Content-driven, unsupervised clustering of news articles through multiscale graph partitioning

Aug 03, 2018
M. Tarik Altuncu, Sophia N. Yaliraki, Mauricio Barahona

The explosion in the amount of news and journalistic content being generated across the globe, coupled with extended and instantaneous access to information through online media, makes it difficult and time-consuming to monitor news developments and opinion formation in real time. There is an increasing need for tools that can pre-process, analyse and classify raw text to extract interpretable content; specifically, identifying topics and content-driven groupings of articles. We present here such a methodology that brings together powerful vector embeddings from Natural Language Processing with tools from Graph Theory that exploit diffusive dynamics on graphs to reveal natural partitions across scales. Our framework uses a recent deep neural network text analysis methodology (Doc2vec) to represent text in vector form and then applies a multi-scale community detection method (Markov Stability) to partition a similarity graph of document vectors. The method allows us to obtain clusters of documents with similar content, at different levels of resolution, in an unsupervised manner. We showcase our approach with the analysis of a corpus of 9,000 news articles published by Vox Media over one year. Our results show consistent groupings of documents according to content without a priori assumptions about the number or type of clusters to be found. The multilevel clustering reveals a quasi-hierarchy of topics and subtopics with increased intelligibility and improved topic coherence as compared to external taxonomy services and standard topic detection methods.

* 8 pages; 5 figures; To present at KDD 2018: Data Science, Journalism & Media workshop 

  Access Paper or Ask Questions