Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Topic Modeling": models, code, and papers

Modeling opinion leader's role in the diffusion of innovation

Jan 27, 2021
Natasa Vodopivec, Carole Adam, Jean-Pierre Chanteau

The diffusion of innovations is an important topic for the consumer markets. Early research focused on how innovations spread on the level of the whole society. To get closer to the real world scenarios agent based models (ABM) started focusing on individual-level agents. In our work we will translate an existing ABM that investigates the role of opinion leaders in the process of diffusion of innovations to a new, more expressive platform designed for agent based modeling, GAMA. We will do it to show that taking advantage of new features of the chosen platform should be encouraged when making models in the field of social sciences in the future, because it can be beneficial for the explanatory power of simulation results.

* Internship report 
  
Access Paper or Ask Questions

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Sep 26, 2021
Song Feng, Siva Sankalp Patel, Hui Wan, Sachindra Joshi

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such a task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and document-based context in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.

  
Access Paper or Ask Questions

Modeling User Exposure in Recommendation

Feb 04, 2016
Dawen Liang, Laurent Charlin, James McInerney, David M. Blei

Collaborative filtering analyzes user preferences for items (e.g., books, movies, restaurants, academic papers) by exploiting the similarity patterns across users. In implicit feedback settings, all the items, including the ones that a user did not consume, are taken into consideration. But this assumption does not accord with the common sense understanding that users have a limited scope and awareness of items. For example, a user might not have heard of a certain paper, or might live too far away from a restaurant to experience it. In the language of causal analysis, the assignment mechanism (i.e., the items that a user is exposed to) is a latent variable that may change for various user/item combinations. In this paper, we propose a new probabilistic approach that directly incorporates user exposure to items into collaborative filtering. The exposure is modeled as a latent variable and the model infers its value from data. In doing so, we recover one of the most successful state-of-the-art approaches as a special case of our model, and provide a plug-in method for conditioning exposure on various forms of exposure covariates (e.g., topics in text, venue locations). We show that our scalable inference algorithm outperforms existing benchmarks in four different domains both with and without exposure covariates.

* 11 pages, 4 figures. WWW'16 
  
Access Paper or Ask Questions

Cultural Convergence: Insights into the behavior of misinformation networks on Twitter

Jul 07, 2020
Liz McQuillan, Erin McAweeney, Alicia Bargar, Alex Ruch

How can the birth and evolution of ideas and communities in a network be studied over time? We use a multimodal pipeline, consisting of network mapping, topic modeling, bridging centrality, and divergence to analyze Twitter data surrounding the COVID-19 pandemic. We use network mapping to detect accounts creating content surrounding COVID-19, then Latent Dirichlet Allocation to extract topics, and bridging centrality to identify topical and non-topical bridges, before examining the distribution of each topic and bridge over time and applying Jensen-Shannon divergence of topic distributions to show communities that are converging in their topical narratives.

* 15 pages (7 for paper, 3 for reference, 5 for appendix), 3 figures 
  
Access Paper or Ask Questions

NewsEmbed: Modeling News through Pre-trained DocumentRepresentations

Jun 01, 2021
Jialu Liu, Tianqi Liu, Cong Yu

Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large beyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbed's competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.

  
Access Paper or Ask Questions

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

May 28, 2021
Xiachong Feng, Xiaocheng Feng, Libo Qin, Bing Qin, Ting Liu

Current dialogue summarization systems usually encode the text with a number of general semantic features (e.g., keywords and topics) to gain more powerful dialogue modeling capabilities. However, these features are obtained via open-domain toolkits that are dialog-agnostic or heavily relied on human annotations. In this paper, we show how DialoGPT, a pre-trained model for conversational response generation, can be developed as an unsupervised dialogue annotator, which takes advantage of dialogue background knowledge encoded in DialoGPT. We apply DialoGPT to label three types of features on two dialogue summarization datasets, SAMSum and AMI, and employ pre-trained and non pre-trained models as our summarizes. Experimental results show that our proposed method can obtain remarkable improvements on both datasets and achieves new state-of-the-art performance on the SAMSum dataset.

* ACL 2021 
  
Access Paper or Ask Questions

FakeFlow: Fake News Detection by Modeling the Flow of Affective Information

Jan 24, 2021
Bilal Ghanem, Simone Paolo Ponzetto, Paolo Rosso, Francisco Rangel

Fake news articles often stir the readers' attention by means of emotional appeals that arouse their feelings. Unlike in short news texts, authors of longer articles can exploit such affective factors to manipulate readers by adding exaggerations or fabricating events, in order to affect the readers' emotions. To capture this, we propose in this paper to model the flow of affective information in fake news articles using a neural architecture. The proposed model, FakeFlow, learns this flow by combining topic and affective information extracted from text. We evaluate the model's performance with several experiments on four real-world datasets. The results show that FakeFlow achieves superior results when compared against state-of-the-art methods, thus confirming the importance of capturing the flow of the affective information in news articles.

* 9 pages, 6 figures, EACL-2021 
  
Access Paper or Ask Questions

NewsEmbed: Modeling News through Pre-trained Document Representations

Jun 05, 2021
Jialu Liu, Tianqi Liu, Cong Yu

Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large beyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbed's competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.

* Accepted in SIGKDD 2021 
  
Access Paper or Ask Questions

On Privacy Protection of Latent Dirichlet Allocation Model Training

Jun 04, 2019
Fangyuan Zhao, Xuebin Ren, Shusen Yang, Xinyu Yang

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.

* 8 pages,5 figures,and is published in International Joint Conferences on Artificial Intelligence 
  
Access Paper or Ask Questions
<<
41
42
43
44
45
46
47
48
49
50
>>