Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

ReviewViz: Assisting Developers Perform Empirical Study on Energy Consumption Related Reviews for Mobile Applications

Sep 13, 2020
Mohammad Abdul Hadi, Fatemeh H Fard

Improving the energy efficiency of mobile applications is a topic that has gained a lot of attention recently. It has been addressed in a number of ways such as identifying energy bugs and developing a catalog of energy patterns. Previous work shows that users discuss the battery-related issues (energy inefficiency or energy consumption) of the apps in their reviews. However, there is no work that addresses the automatic extraction of battery-related issues from users' feedback. In this paper, we report on a visualization tool that is developed to empirically study machine learning algorithms and text features to automatically identify the energy consumption specific reviews with the highest accuracy. Other than the common machine learning algorithms, we utilize deep learning models with different word embeddings to compare the results. Furthermore, to help the developers extract the main topics that are discussed in the reviews, two states of the art topic modeling algorithms are applied. The visualizations of the topics represent the keywords that are extracted for each topic along with a comparison with the results of string matching. The developed web-browser based interactive visualization tool is a novel framework developed with the intention of giving the app developers insights about running time and accuracy of machine learning and deep learning models as well as extracted topics. The tool makes it easier for the developers to traverse through the extensive result set generated by the text classification and topic modeling algorithms. The dynamic-data structure used for the tool stores the baseline-results of the discussed approaches and is updated when applied on new datasets. The tool is open-sourced to replicate the research results.

* 4 pages, 5 figures 

  Access Paper or Ask Questions

What Truly Matters? Using Linguistic Cues for Analyzing the #BlackLivesMatter Movement and its Counter Protests: 2013 to 2020

Sep 20, 2021
Jamell Dacon, Jiliang Tang

Since the fatal shooting of 17-year old Black teenager Trayvon Martin in February 2012 by a White neighborhood watchman, George Zimmerman in Sanford, Florida, there has been a significant increase in digital activism addressing police-brutality related and racially-motivated incidents in the United States. In this work, we administer an innovative study of digital activism by exploiting social media as an authoritative tool to examine and analyze the linguistic cues and thematic relationships in these three mediums. We conduct a multi-level text analysis on 36,984,559 tweets to investigate users' behaviors to examine the language used and understand the impact of digital activism on social media within each social movement on a sentence-level, word-level, and topic-level. Our results show that excessive use of racially-related or prejudicial hashtags were used by the counter protests which portray potential discriminatory tendencies. Consequently, our findings highlight that social activism done by Black Lives Matter activists does not diverge from the social issues and topics involving police-brutality related and racially-motivated killings of Black individuals due to the shape of its topical graph that topics and conversations encircling the largest component directly relate to the topic of Black Lives Matter. Finally, we see that both Blue Lives Matter and All Lives Matter movements depict a different directive, as the topics of Blue Lives Matter or All Lives Matter do not reside in the center. These findings suggest that topics and conversations within each social movement are skewed, random or possessed racially-related undertones, and thus, deviating from the prominent social injustice issues.

* Under review 

  Access Paper or Ask Questions

Modeling Proficiency with Implicit User Representations

Oct 15, 2021
Kim Breitwieser, Allison Lahnala, Charles Welch, Lucie Flek, Martin Potthast

We introduce the problem of proficiency modeling: Given a user's posts on a social media platform, the task is to identify the subset of posts or topics for which the user has some level of proficiency. This enables the filtering and ranking of social media posts on a given topic as per user proficiency. Unlike experts on a given topic, proficient users may not have received formal training and possess years of practical experience, but may be autodidacts, hobbyists, and people with sustained interest, enabling them to make genuine and original contributions to discourse. While predicting whether a user is an expert on a given topic imposes strong constraints on who is a true positive, proficiency modeling implies a graded scoring, relaxing these constraints. Put another way, many active social media users can be assumed to possess, or eventually acquire, some level of proficiency on topics relevant to their community. We tackle proficiency modeling in an unsupervised manner by utilizing user embeddings to model engagement with a given topic, as indicated by a user's preference for authoring related content. We investigate five alternative approaches to model proficiency, ranging from basic ones to an advanced, tailored user modeling approach, applied within two real-world benchmarks for evaluation.

  Access Paper or Ask Questions

A picture is worth a thousand words but how to organize thousands of pictures?

Mar 15, 2018
Stefan Lonn, Petia Radeva, Mariella Dimiccoli

We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 10 persons. Experimental results demonstrate better user satisfaction with respect to state of the art solutions in terms of organization.

  Access Paper or Ask Questions

Modeling User Behaviour in Research Paper Recommendation System

Jul 16, 2021
Arpita Chaudhuri, Debasis Samanta, Monalisa Sarma

User intention which often changes dynamically is considered to be an important factor for modeling users in the design of recommendation systems. Recent studies are starting to focus on predicting user intention (what users want) beyond user preference (what users like). In this work, a user intention model is proposed based on deep sequential topic analysis. The model predicts a user's intention in terms of the topic of interest. The Hybrid Topic Model (HTM) comprising Latent Dirichlet Allocation (LDA) and Word2Vec is proposed to derive the topic of interest of users and the history of preferences. HTM finds the true topics of papers estimating word-topic distribution which includes syntactic and semantic correlations among words. Next, to model user intention, a Long Short Term Memory (LSTM) based sequential deep learning model is proposed. This model takes into account temporal context, namely the time difference between clicks of two consecutive papers seen by a user. Extensive experiments with the real-world research paper dataset indicate that the proposed approach significantly outperforms the state-of-the-art methods. Further, the proposed approach introduces a new road map to model a user activity suitable for the design of a research paper recommendation system.

* 23 pages 

  Access Paper or Ask Questions

Confirmatory Aspect-based Opinion Mining Processes

Jul 30, 2019
Jongho Im, Taikgun Song, Youngsu Lee, Jewoo Kim

A new opinion extraction method is proposed to summarize unstructured, user-generated content (i.e., online customer reviews) in the fixed topic domains. To differentiate the current approach from other opinion extraction approaches, which are often exposed to a sparsity problem and lack of sentiment scores, a confirmatory aspect-based opinion mining framework is introduced along with its practical algorithm called DiSSBUS. In this procedure, 1) each customer review is disintegrated into a set of clauses; 2) each clause is summarized to bi-terms-a topic word and an evaluation word-using a part-of-speech (POS) tagger; and 3) each bi-term is matched to a pre-specified topic relevant to a specific domain. The proposed processes have two primary advantages over existing methods: 1) they can decompose a single review into a set of bi-terms related to pre-specified topics in the domain of interest and, therefore, 2) allow identification of the reviewer's opinions on the topics via evaluation words within the set of bi-terms. The proposed aspect-based opinion mining is applied to customer reviews of restaurants in Hawaii obtained from TripAdvisor, and the empirical findings validate the effectiveness of the method. Keywords: Clause-based sentiment analysis, Customer review, Opinion mining, Topic modeling, User-generate-contents.

  Access Paper or Ask Questions

Variable Selection for Latent Dirichlet Allocation

May 04, 2012
Dongwoo Kim, Yeonseung Chung, Alice Oh

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.

  Access Paper or Ask Questions

Online Learning of Optimally Diverse Rankings

Sep 13, 2021
Stefan Magureanu, Alexandre Proutiere, Marcus Isaksson, Boxun Zhang

Search engines answer users' queries by listing relevant items (e.g. documents, songs, products, web pages, ...). These engines rely on algorithms that learn to rank items so as to present an ordered list maximizing the probability that it contains relevant item. The main challenge in the design of learning-to-rank algorithms stems from the fact that queries often have different meanings for different users. In absence of any contextual information about the query, one often has to adhere to the {\it diversity} principle, i.e., to return a list covering the various possible topics or meanings of the query. To formalize this learning-to-rank problem, we propose a natural model where (i) items are categorized into topics, (ii) users find items relevant only if they match the topic of their query, and (iii) the engine is not aware of the topic of an arriving query, nor of the frequency at which queries related to various topics arrive, nor of the topic-dependent click-through-rates of the items. For this problem, we devise LDR (Learning Diverse Rankings), an algorithm that efficiently learns the optimal list based on users' feedback only. We show that after $T$ queries, the regret of LDR scales as $O((N-L)\log(T))$ where $N$ is the number of all items. We further establish that this scaling cannot be improved, i.e., LDR is order optimal. Finally, using numerical experiments on both artificial and real-world data, we illustrate the superiority of LDR compared to existing learning-to-rank algorithms.

* Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 2, December 2017, Article No 32 
* 26 pages, 4 Figures, accepted in ACM SIGMETRICS 2018 

  Access Paper or Ask Questions

Assessing Sentiment of the Expressed Stance on Social Media

Aug 08, 2019
Abeer Aldayel, Walid Magdy

Stance detection is the task of inferring viewpoint towards a given topic or entity either being supportive or opposing. One may express a viewpoint towards a topic by using positive or negative language. This paper examines how the stance is being expressed in social media according to the sentiment polarity. There has been a noticeable misconception of the similarity between the stance and sentiment when it comes to viewpoint discovery, where negative sentiment is assumed to mean against stance, and positive sentiment means in-favour stance. To analyze the relation between stance and sentiment, we construct a new dataset with four topics and examine how people express their viewpoint with regards these topics. We validate our results by carrying a further analysis of the popular stance benchmark SemEval stance dataset. Our analyses reveal that sentiment and stance are not highly aligned, and hence the simple sentiment polarity cannot be used solely to denote a stance toward a given topic.

* Accepted as a full paper at Socinfo 2019. Please cite the Socinfo version 

  Access Paper or Ask Questions

Towards Autoencoding Variational Inference for Aspect-based Opinion Summary

Feb 16, 2019
Tai Hoang, Huy Le, Tho Quan

Aspect-based Opinion Summary (AOS), consisting of aspect discovery and sentiment classification steps, has recently been emerging as one of the most crucial data mining tasks in e-commerce systems. Along this direction, the LDA-based model is considered as a notably suitable approach, since this model offers both topic modeling and sentiment classification. However, unlike traditional topic modeling, in the context of aspect discovery it is often required some initial seed words, whose prior knowledge is not easy to be incorporated into LDA models. Moreover, LDA approaches rely on sampling methods, which need to load the whole corpus into memory, making them hardly scalable. In this research, we study an alternative approach for AOS problem, based on Autoencoding Variational Inference (AVI). Firstly, we introduce the Autoencoding Variational Inference for Aspect Discovery (AVIAD) model, which extends the previous work of Autoencoding Variational Inference for Topic Models (AVITM) to embed prior knowledge of seed words. This work includes enhancement of the previous AVI architecture and also modification of the loss function. Ultimately, we present the Autoencoding Variational Inference for Joint Sentiment/Topic (AVIJST) model. In this model, we substantially extend the AVI model to support the JST model, which performs topic modeling for corresponding sentiment. The experimental results show that our proposed models enjoy higher topic coherent, faster convergence time and better accuracy on sentiment classification, as compared to their LDA-based counterparts.

* 20 pages, 11 figures, under review at The Computer Journal 

  Access Paper or Ask Questions