Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Velcin

Properties of Reddit News Topical Interactions

Sep 16, 2022

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Properties of Reddit News Topical Interactions

Figure 2 for Properties of Reddit News Topical Interactions

Figure 3 for Properties of Reddit News Topical Interactions

Figure 4 for Properties of Reddit News Topical Interactions

Abstract:Most models of information diffusion online rely on the assumption that pieces of information spread independently from each other. However, several works pointed out the necessity of investigating the role of interactions in real-world processes, and highlighted possible difficulties in doing so: interactions are sparse and brief. As an answer, recent advances developed models to account for interactions in underlying publication dynamics. In this article, we propose to extend and apply one such model to determine whether interactions between news headlines on Reddit play a significant role in their underlying publication mechanisms. After conducting an in-depth case study on 100,000 news headline from 2019, we retrieve state-of-the-art conclusions about interactions and conclude that they play a minor role in this dataset.

* 2022 Complex Networks and their Applications XI
* Published at the conference Complex Networks and their Applications

Via

Access Paper or Ask Questions

Serialized Interacting Mixed Membership Stochastic Block Model

Sep 16, 2022

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Serialized Interacting Mixed Membership Stochastic Block Model

Figure 2 for Serialized Interacting Mixed Membership Stochastic Block Model

Figure 3 for Serialized Interacting Mixed Membership Stochastic Block Model

Figure 4 for Serialized Interacting Mixed Membership Stochastic Block Model

Abstract:Last years have seen a regain of interest for the use of stochastic block modeling (SBM) in recommender systems. These models are seen as a flexible alternative to tensor decomposition techniques that are able to handle labeled data. Recent works proposed to tackle discrete recommendation problems via SBMs by considering larger contexts as input data and by adding second order interactions between contexts' related elements. In this work, we show that these models are all special cases of a single global framework: the Serialized Interacting Mixed membership Stochastic Block Model (SIMSBM). It allows to model an arbitrarily large context as well as an arbitrarily high order of interactions. We demonstrate that SIMSBM generalizes several recent SBM-based baselines. Besides, we demonstrate that our formulation allows for an increased predictive power on six real-world datasets.

* ICDM 2022 - IEEE International Conference on Data Mining 2022
* Published at ICDM 2022

Via

Access Paper or Ask Questions

Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

Jan 29, 2022

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

Figure 2 for Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

Figure 3 for Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

Abstract:The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little. Furthermore, the textual content of a document is not always correlated to its temporal dynamics. We develop a method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. PDHP also alleviates the hypothesis that textual content and temporal dynamics are perfectly correlated. We demonstrate that PDHP generalizes previous work --such as DHP and UP. Finally, we illustrate a possible application using a real-world dataset from Reddit.

* in French

Via

Access Paper or Ask Questions

Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Nov 05, 2021

Clément Christophe, Julien Velcin, Jairo Cugliari, Manel Boumghar, Philippe Suignard

Figure 1 for Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Figure 2 for Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Figure 3 for Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Figure 4 for Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Abstract:Slow emerging topic detection is a task between event detection, where we aggregate behaviors of different words on short period of time, and language evolution, where we monitor their long term evolution. In this work, we tackle the problem of early detection of slowly emerging new topics. To this end, we gather evidence of weak signals at the word level. We propose to monitor the behavior of words representation in an embedding space and use one of its geometrical properties to characterize the emergence of topics. As evaluation is typically hard for this kind of task, we present a framework for quantitative evaluation. We show positive results that outperform state-of-the-art methods on two public datasets of press and scientific articles.

Via

Access Paper or Ask Questions

Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

Sep 15, 2021

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

Figure 2 for Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

Figure 3 for Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

Figure 4 for Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior

Abstract:The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little information or when temporal dynamics are hard to unveil. Furthermore, the textual content of a document is not always linked to its temporal dynamics. We develop a flexible method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). We show PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. The PDHP also alleviates the hypothesis that textual content and temporal dynamics are always perfectly correlated. PDHP allows retrieving textual clusters, temporal clusters, or a mixture of both with high accuracy when they are not. We demonstrate that PDHP generalizes previous work --such as the Dirichlet-Hawkes process (DHP) and Uniform process (UP). Finally, we illustrate the changes induced by PDHP over DHP and UP in a real-world application using Reddit data.

Via

Access Paper or Ask Questions

Information Interaction Profile of Choice Adoption

Apr 28, 2021

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Abstract:Interactions between pieces of information (entities) play a substantial role in the way an individual acts on them: adoption of a product, the spread of news, strategy choice, etc. However, the underlying interaction mechanisms are often unknown and have been little explored in the literature. We introduce an efficient method to infer both the entities interaction network and its evolution according to the temporal distance separating interacting entities; together, they form the interaction profile. The interaction profile allows characterizing the mechanisms of the interaction processes. We approach this problem via a convex model based on recent advances in multi-kernel inference. We consider an ordered sequence of exposures to entities (URL, ads, situations) and the actions the user exerts on them (share, click, decision). We study how users exhibit different behaviors according to combinations of exposures they have been exposed to. We show that the effect of a combination of exposures on a user is more than the sum of each exposure's independent effect--there is an interaction. We reduce this modeling to a non-parametric convex optimization problem that can be solved in parallel. Our method recovers state-of-the-art results on interaction processes on three real-world datasets and outperforms baselines in the inference of the underlying data generation mechanisms. Finally, we show that interaction profiles can be visualized intuitively, easing the interpretation of the model.

* 18 pages, 4 figures

Via

Access Paper or Ask Questions

Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Apr 26, 2021

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Figure 2 for Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Figure 3 for Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Figure 4 for Powered Dirichlet Process for Controlling the Importance of "Rich-Get-Richer" Prior Assumptions in Bayesian Clustering

Abstract:One of the most used priors in Bayesian clustering is the Dirichlet prior. It can be expressed as a Chinese Restaurant Process. This process allows nonparametric estimation of the number of clusters when partitioning datasets. Its key feature is the "rich-get-richer" property, which assumes a cluster has an a priori probability to get chosen linearly dependent on population. In this paper, we show that such prior is not always the best choice to model data. We derive the Powered Chinese Restaurant process from a modified version of the Dirichlet-Multinomial distribution to answer this problem. We then develop some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the "rich-get-richer" prior.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

Interactions in information spread: quantification and interpretation using stochastic block models

Apr 09, 2020

Gaël Poux-Médard, Julien Velcin, Sabine Loudcher

Figure 1 for Interactions in information spread: quantification and interpretation using stochastic block models

Figure 2 for Interactions in information spread: quantification and interpretation using stochastic block models

Figure 3 for Interactions in information spread: quantification and interpretation using stochastic block models

Figure 4 for Interactions in information spread: quantification and interpretation using stochastic block models

Abstract:In most real-world applications, it is seldom the case that a given observable evolves independently of its environment. In social networks, users' behavior results from the people they interact with, news in their feed, or trending topics. In natural language, the meaning of phrases emerges from the combination of words. In general medicine, a diagnosis is established on the basis of the interaction of symptoms. Here, we propose a new model, the Interactive Mixed Membership Stochastic Block Model (IMMSBM), which investigates the role of interactions between entities (hashtags, words, memes, etc.) and quantifies their importance within the aforementioned corpora. We find that interactions play an important role in those corpora. In inference tasks, taking them into account leads to average relative changes with respect to non-interactive models of up to 150\% in the probability of an outcome. Furthermore, their role greatly improves the predictive power of the model. Our findings suggest that neglecting interactions when modeling real-world phenomena might lead to incorrect conclusions being drawn.

* 17 pages, 3 figures, submitted to ECML-PKDD 2020

Via

Access Paper or Ask Questions

Document Network Projection in Pretrained Word Embedding Space

Jan 16, 2020

Antoine Gourru, Adrien Guille, Julien Velcin, Julien Jacques

Figure 1 for Document Network Projection in Pretrained Word Embedding Space

Figure 2 for Document Network Projection in Pretrained Word Embedding Space

Figure 3 for Document Network Projection in Pretrained Word Embedding Space

Figure 4 for Document Network Projection in Pretrained Word Embedding Space

Abstract:We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

Via

Access Paper or Ask Questions

Inductive Document Network Embedding with Topic-Word Attention

Jan 10, 2020

Robin Brochier, Adrien Guille, Julien Velcin

Figure 1 for Inductive Document Network Embedding with Topic-Word Attention

Figure 2 for Inductive Document Network Embedding with Topic-Word Attention

Figure 3 for Inductive Document Network Embedding with Topic-Word Attention

Figure 4 for Inductive Document Network Embedding with Topic-Word Attention

Abstract:Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.

Via

Access Paper or Ask Questions