Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Topic Modeling": models, code, and papers

Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

Jul 18, 2020
Xin Deng, Ross Smith, Genevieve Quintin

Figure 1 for Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

Figure 2 for Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

Figure 3 for Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

Figure 4 for Semi-Supervised Learning Approach to Discover Enterprise User Insights from Feedback and Support

With the evolution of the cloud and customer centric culture, we inherently accumulate huge repositories of textual reviews, feedback, and support data.This has driven enterprises to seek and research engagement patterns, user network analysis, topic detections, etc.However, huge manual work is still necessary to mine data to be able to mine actionable outcomes.In this paper, we proposed and developed an innovative Semi-Supervised Learning approach by utilizing Deep Learning and Topic Modeling to have a better understanding of the user voice.This approach combines a BERT-based multiclassification algorithm through supervised learning combined with a novel Probabilistic and Semantic Hybrid Topic Inference (PSHTI) Model through unsupervised learning, aiming at automating the process of better identifying the main topics or areas as well as the sub-topics from the textual feedback and support.There are three major contributions and break-through:1.As the advancement of deep learning technology, there have been tremendous innovations in the NLP field, yet the traditional topic modeling as one of the NLP applications lag behind the tide of deep learning.In the methodology and technical perspective, we adopt transfer learning to fine-tune a BERT-based multiclassification system to categorize the main topics and then utilize the novel PSHTI model to infer the sub-topics under the predicted main topics.2.The traditional unsupervised learning-based topic models or clustering methods suffer from the difficulty of automatically generating a meaningful topic label, but our system enables mapping the top words to the self-help issues by utilizing domain knowledge about the product through web-crawling.3.This work provides a prominent showcase by leveraging the state-of-the-art methodology in the real production to help shed light to discover user insights and drive business investment priorities.

* 7 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions

Scalable Probabilistic Entity-Topic Modeling

Sep 02, 2013
Neil Houlsby, Massimiliano Ciaramita

Figure 1 for Scalable Probabilistic Entity-Topic Modeling

Figure 2 for Scalable Probabilistic Entity-Topic Modeling

Figure 3 for Scalable Probabilistic Entity-Topic Modeling

Figure 4 for Scalable Probabilistic Entity-Topic Modeling

We present an LDA approach to entity disambiguation. Each topic is associated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary size, both in the millions. We tackle these problems using a novel distributed inference and representation framework based on a parallel Gibbs sampler guided by the Wikipedia link graph, and pipelines of MapReduce allowing fast and memory-frugal processing of large datasets. We report state-of-the-art performance on a public dataset.

Via

Access Paper or Ask Questions

A Random Matrix Perspective on Random Tensors

Aug 02, 2021
José Henrique de Morais Goulart, Romain Couillet, Pierre Comon

Figure 1 for A Random Matrix Perspective on Random Tensors

Figure 2 for A Random Matrix Perspective on Random Tensors

Figure 3 for A Random Matrix Perspective on Random Tensors

Figure 4 for A Random Matrix Perspective on Random Tensors

Tensor models play an increasingly prominent role in many fields, notably in machine learning. In several applications of such models, such as community detection, topic modeling and Gaussian mixture learning, one must estimate a low-rank signal from a noisy tensor. Hence, understanding the fundamental limits and the attainable performance of estimators of that signal inevitably calls for the study of random tensors. Substantial progress has been achieved on this subject thanks to recent efforts, under the assumption that the tensor dimensions grow large. Yet, some of the most significant among these results--in particular, a precise characterization of the abrupt phase transition (in terms of signal-to-noise ratio) that governs the performance of the maximum likelihood (ML) estimator of a symmetric rank-one model with Gaussian noise--were derived on the basis of statistical physics ideas, which are not easily accessible to non-experts. In this work, we develop a sharply distinct approach, relying instead on standard but powerful tools brought by years of advances in random matrix theory. The key idea is to study the spectra of random matrices arising from contractions of a given random tensor. We show how this gives access to spectral properties of the random tensor itself. In the specific case of a symmetric rank-one model with Gaussian noise, our technique yields a hitherto unknown characterization of the local maximum of the ML problem that is global above the phase transition threshold. This characterization is in terms of a fixed-point equation satisfied by a formula that had only been previously obtained via statistical physics methods. Moreover, our analysis sheds light on certain properties of the landscape of the ML problem in the large-dimensional setting. Our approach is versatile and can be extended to other models, such as asymmetric, non-Gaussian and higher-order ones.

Via

Access Paper or Ask Questions

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Aug 25, 2020
Ginevra Carbone, Gabriele Sarti

Figure 1 for ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Figure 2 for ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Figure 3 for ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Figure 4 for ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitable for low-resource settings. We present ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections. We first test the effectiveness of our approach in a low-resource setting for Italian, evaluating the conditioning for both topic models and gold annotations. We then perform a comparative evaluation of ETC-NLG for Italian and English using a parallel corpus. Finally, we propose an automatic approach to estimate the effectiveness of conditioning on the generated utterances.

Via

Access Paper or Ask Questions

Viewpoint and Topic Modeling of Current Events

Aug 14, 2016
Kerry Zhang, Jussi Karlgren, Cheng Zhang, Jens Lagergren

Figure 1 for Viewpoint and Topic Modeling of Current Events

Figure 2 for Viewpoint and Topic Modeling of Current Events

Figure 3 for Viewpoint and Topic Modeling of Current Events

Figure 4 for Viewpoint and Topic Modeling of Current Events

There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.

* 16 pages, 4 figures, 4 tables

Via

Access Paper or Ask Questions

Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Feb 14, 2022
Guofeng Mei, Litao Yu, Qiang Wu, Jian Zhang, Mohammed Bennamoun

Figure 1 for Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Figure 2 for Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Figure 3 for Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Figure 4 for Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting

Learning from unlabeled or partially labeled data to alleviate human labeling remains a challenging research topic in 3D modeling. Along this line, unsupervised representation learning is a promising direction to auto-extract features without human intervention. This paper proposes a general unsupervised approach, named \textbf{ConClu}, to perform the learning of point-wise and global features by jointly leveraging point-level clustering and instance-level contrasting. Specifically, for one thing, we design an Expectation-Maximization (EM) like soft clustering algorithm that provides local supervision to extract discriminating local features based on optimal transport. We show that this criterion extends standard cross-entropy minimization to an optimal transport problem, which we solve efficiently using a fast variant of the Sinkhorn-Knopp algorithm. For another, we provide an instance-level contrasting method to learn the global geometry, which is formulated by maximizing the similarity between two augmentations of one point cloud. Experimental evaluations on downstream applications such as 3D object classification and semantic segmentation demonstrate the effectiveness of our framework and show that it can outperform state-of-the-art techniques.

Via

Access Paper or Ask Questions

Panarchy: ripples of a boundary concept

Dec 28, 2020
Juan Rocha, Linda Luvuno, Jesse Rieb, Erin Crockett, Katja Malmborg, Michael Schoon, Garry Peterson

Figure 1 for Panarchy: ripples of a boundary concept

Figure 2 for Panarchy: ripples of a boundary concept

Figure 3 for Panarchy: ripples of a boundary concept

Figure 4 for Panarchy: ripples of a boundary concept

How do social-ecological systems change over time? In 2002 Holling and colleagues proposed the concept of Panarchy, which presented social-ecological systems as an interacting set of adaptive cycles, each of which is produced by the dynamic tensions between novelty and efficiency at multiple scales. Initially introduced as a conceptual framework and set of metaphors, panarchy has gained the attention of scholars across many disciplines and its ideas continue to inspire further conceptual developments. Almost twenty years after this concept was introduced we review how it has been used, tested, extended and revised. We do this by combining qualitative methods and machine learning. Document analysis was used to code panarchy features that are commonly used in the scientific literature (N = 42), a qualitative analysis that was complemented with topic modeling of 2177 documents. We find that the adaptive cycle is the feature of panarchy that has attracted the most attention. Challenges remain in empirically grounding the metaphor, but recent theoretical and empirical work offers some avenues for future research.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

Aug 01, 2019
Mennatallah El-Assady, Rebecca Kehlbeck, Christopher Collins, Daniel Keim, Oliver Deussen

Figure 1 for Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

Figure 2 for Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

Figure 3 for Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

Figure 4 for Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections

We present a framework that allows users to incorporate the semantics of their domain knowledge for topic model refinement while remaining model-agnostic. Our approach enables users to (1) understand the semantic space of the model, (2) identify regions of potential conflicts and problems, and (3) readjust the semantic relation of concepts based on their understanding, directly influencing the topic modeling. These tasks are supported by an interactive visual analytics workspace that uses word-embedding projections to define concept regions which can then be refined. The user-refined concepts are independent of a particular document collection and can be transferred to related corpora. All user interactions within the concept space directly affect the semantic relations of the underlying vector space model, which, in turn, change the topic modeling. In addition to direct manipulation, our system guides the users' decision-making process through recommended interactions that point out potential improvements. This targeted refinement aims at minimizing the feedback required for an efficient human-in-the-loop process. We confirm the improvements achieved through our approach in two user studies that show topic model quality improvements through our visual knowledge externalization and learning process.

* IEEE Transactions on Visualization and Computer Graphics, 2019

Via

Access Paper or Ask Questions

WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

Mar 04, 2018
Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou

Figure 1 for WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

Figure 2 for WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

Figure 3 for WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

Figure 4 for WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora.

* ICLR 2018

Via

Access Paper or Ask Questions

ATM:Adversarial-neural Topic Model

Nov 01, 2018
Rui Wang, Deyu Zhou, Yulan He

Figure 1 for ATM:Adversarial-neural Topic Model

Figure 2 for ATM:Adversarial-neural Topic Model

Figure 3 for ATM:Adversarial-neural Topic Model

Figure 4 for ATM:Adversarial-neural Topic Model

Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address these limitations, we propose a topic modeling approach based on Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM). The proposed ATM models topics with Dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. To illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. Our experimental results on the two public corpora show that ATM generates more coherence topics, outperforming a number of competitive baselines. Moreover, ATM is able to extract meaningful events from news articles.

Via

Access Paper or Ask Questions