Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Tanbih: Get To Know What You Are Reading

Oct 04, 2019
Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet. In addition, we automatically analyse each article to detect whether it is propagandistic and to determine its stance with respect to a number of controversial topics.

* EMNLP-2019 

  Access Paper or Ask Questions

Semantic Variation in Online Communities of Practice

Jun 15, 2018
Marco Del Tredici, Raquel Fernández

We introduce a framework for quantifying semantic variation of common words in Communities of Practice and in sets of topic-related communities. We show that while some meaning shifts are shared across related communities, others are community-specific, and therefore independent from the discussed topic. We propose such findings as evidence in favour of sociolinguistic theories of socially-driven semantic variation. Results are evaluated using an independent language modelling task. Furthermore, we investigate extralinguistic features and show that factors such as prominence and dissemination of words are related to semantic variation.

* 13 pages, Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017) 

  Access Paper or Ask Questions

Search Personalization with Embeddings

Dec 12, 2016
Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis

Recent research has shown that the performance of search personalization depends on the richness of user profiles which normally represent the user's topical interests. In this paper, we propose a new embedding approach to learning user profiles, where users are embedded on a topical interest space. We then directly utilize the user profiles for search personalization. Experiments on query logs from a major commercial web search engine demonstrate that our embedding approach improves the performance of the search engine and also achieves better search performance than other strong baselines.

* In Proceedings of the 39th European Conference on Information Retrieval, ECIR 2017, to appear 

  Access Paper or Ask Questions

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

May 12, 2004
Regina Barzilay, Lillian Lee

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from un-annotated documents, utilizing a novel adaptation of algorithms for Hidden Markov Models. We then apply our method to two complementary tasks: information ordering and extractive summarization. Our experiments show that incorporating content models in these applications yields substantial improvement over previously-proposed methods.

* HLT-NAACL 2004: Proceedings of the Main Conference, pp. 113--120 
* Best paper award 

  Access Paper or Ask Questions

Tensor Decompositions in Deep Learning

Feb 26, 2020
Davide Bacciu, Danilo P. Mandic

The paper surveys the topic of tensor decompositions in modern machine learning applications. It focuses on three active research topics of significant relevance for the community. After a brief review of consolidated works on multi-way data analysis, we consider the use of tensor decompositions in compressing the parameter space of deep learning models. Lastly, we discuss how tensor methods can be leveraged to yield richer adaptive representations of complex data, including structured information. The paper concludes with a discussion on interesting open research challenges.

  Access Paper or Ask Questions

Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques

Sep 19, 2014
Eric W. Tramel, Santhosh Kumar, Andrei Giurgiu, Andrea Montanari

These notes review six lectures given by Prof. Andrea Montanari on the topic of statistical estimation for linear models. The first two lectures cover the principles of signal recovery from linear measurements in terms of minimax risk. Subsequent lectures demonstrate the application of these principles to several practical problems in science and engineering. Specifically, these topics include denoising of error-laden signals, recovery of compressively sensed signals, reconstruction of low-rank matrices, and also the discovery of hidden cliques within large networks.

* Chapter of "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", Eds.: F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E. W. Tramel, L. F. Cugliandolo (Oxford University Press, to appear) 

  Access Paper or Ask Questions

On a Guided Nonnegative Matrix Factorization

Oct 22, 2020
Joshua Vendrow, Jamie Haddock, Elizaveta Rebrova, Deanna Needell

Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed \textit{Guided NMF}, that incorporates user-designed seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.

* 6 pages, 6 tables 

  Access Paper or Ask Questions

Talking to myself: self-dialogues as data for conversational agents

Sep 19, 2018
Joachim Fainberg, Ben Krause, Mihai Dobre, Marco Damonte, Emmanuel Kahembwe, Daniel Duma, Bonnie Webber, Federico Fancellu

Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.

* 5 pages, 5 pages appendix, 2 figures 

  Access Paper or Ask Questions

Thumbs up? Sentiment Classification using Machine Learning Techniques

May 28, 2002
Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

* To appear in EMNLP-2002 

  Access Paper or Ask Questions

Abstractive Meeting Summarization UsingDependency Graph Fusion

Sep 22, 2016
Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama

Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim to generate a one sentence summary from the most important utterances using an integer linear programming-based sentence fusion approach. Experimental results show that our method can generate more informative summaries than the baselines.

* WWW '15 Companion Proceedings of the 24th International Conference on World Wide Web, Pages 5-6. arXiv admin note: substantial text overlap with arXiv:1609.07033 

  Access Paper or Ask Questions