Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Statistical Estimation: From Denoising to Sparse Regression and Hidden Cliques

Sep 19, 2014
Eric W. Tramel, Santhosh Kumar, Andrei Giurgiu, Andrea Montanari

These notes review six lectures given by Prof. Andrea Montanari on the topic of statistical estimation for linear models. The first two lectures cover the principles of signal recovery from linear measurements in terms of minimax risk. Subsequent lectures demonstrate the application of these principles to several practical problems in science and engineering. Specifically, these topics include denoising of error-laden signals, recovery of compressively sensed signals, reconstruction of low-rank matrices, and also the discovery of hidden cliques within large networks.

* Chapter of "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", Eds.: F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E. W. Tramel, L. F. Cugliandolo (Oxford University Press, to appear) 

  Access Paper or Ask Questions

On a Guided Nonnegative Matrix Factorization

Oct 22, 2020
Joshua Vendrow, Jamie Haddock, Elizaveta Rebrova, Deanna Needell

Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed \textit{Guided NMF}, that incorporates user-designed seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.

* 6 pages, 6 tables 

  Access Paper or Ask Questions

Talking to myself: self-dialogues as data for conversational agents

Sep 19, 2018
Joachim Fainberg, Ben Krause, Mihai Dobre, Marco Damonte, Emmanuel Kahembwe, Daniel Duma, Bonnie Webber, Federico Fancellu

Conversational agents are gaining popularity with the increasing ubiquity of smart devices. However, training agents in a data driven manner is challenging due to a lack of suitable corpora. This paper presents a novel method for gathering topical, unstructured conversational data in an efficient way: self-dialogues through crowd-sourcing. Alongside this paper, we include a corpus of 3.6 million words across 23 topics. We argue the utility of the corpus by comparing self-dialogues with standard two-party conversations as well as data from other corpora.

* 5 pages, 5 pages appendix, 2 figures 

  Access Paper or Ask Questions

Thumbs up? Sentiment Classification using Machine Learning Techniques

May 28, 2002
Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

* To appear in EMNLP-2002 

  Access Paper or Ask Questions

Abstractive Meeting Summarization UsingDependency Graph Fusion

Sep 22, 2016
Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama

Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim to generate a one sentence summary from the most important utterances using an integer linear programming-based sentence fusion approach. Experimental results show that our method can generate more informative summaries than the baselines.

* WWW '15 Companion Proceedings of the 24th International Conference on World Wide Web, Pages 5-6. arXiv admin note: substantial text overlap with arXiv:1609.07033 

  Access Paper or Ask Questions

A Comparative Study on Linguistic Feature Selection in Sentiment Polarity Classification

Nov 04, 2013
Zitao Liu

Sentiment polarity classification is perhaps the most widely studied topic. It classifies an opinionated document as expressing a positive or negative opinion. In this paper, using movie review dataset, we perform a comparative study with different single kind linguistic features and the combinations of these features. We find that the classic topic-based classifier(Naive Bayes and Support Vector Machine) do not perform as well on sentiment polarity classification. And we find that with some combination of different linguistic features, the classification accuracy can be boosted a lot. We give some reasonable explanations about these boosting outcomes.

* arXiv admin note: text overlap with arXiv:cs/0205070 by other authors 

  Access Paper or Ask Questions

Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media

Sep 21, 2020
Shamik Roy, Dan Goldwasser

In this paper we suggest a minimally-supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way. We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control and abortion. We demonstrate the ability of the subframes to capture ideological differences and analyze political discourse in news media.

* 19 pages, 6 figures, Will appear in EMNLP 2020 

  Access Paper or Ask Questions

Autonomy, Authenticity, Authorship and Intention in computer generated art

Mar 06, 2019
Jon McCormack, Toby Gifford, Patrick Hutchings

This paper examines five key questions surrounding computer generated art. Driven by the recent public auction of a work of `AI Art' we selectively summarise many decades of research and commentary around topics of autonomy, authenticity, authorship and intention in computer generated art, and use this research to answer contemporary questions often asked about art made by computers that concern these topics. We additionally reflect on whether current techniques in deep learning and Generative Adversarial Networks significantly change the answers provided by many decades of prior research.

* Accepted for EvoMUSART 2019: 8th International Conference on Computational Intelligence in Music, Sound, Art and Design. April 2019, Leipzig, Germany 

  Access Paper or Ask Questions

Incremental Natural Language Processing: Challenges, Strategies, and Evaluation

Jun 14, 2018
Arne Köhn

Incrementality is ubiquitous in human-human interaction and beneficial for human-computer interaction. It has been a topic of research in different parts of the NLP community, mostly with focus on the specific topic at hand even though incremental systems have to deal with similar challenges regardless of domain. In this survey, I consolidate and categorize the approaches, identifying similarities and differences in the computation and data, and show trade-offs that have to be considered. A focus lies on evaluating incremental systems because the standard metrics often fail to capture the incremental properties of a system and coming up with a suitable evaluation scheme is non-trivial.

* COLING 2018 (accepted), camera-ready version 

  Access Paper or Ask Questions

Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Apr 27, 2010
Carlos M. Lorenzetti, Ana G. Maguitman

This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.

* XXXIV Conferencia Latinoamericana de Inform\'{a}tica, pp. 200-209, 2008 
* 10 pages, 3 figures, CLEI 2008 

  Access Paper or Ask Questions