Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

WikiDataSets : Standardized sub-graphs from WikiData

Jun 11, 2019
Armand Boschin

Developing new ideas and algorithms in the fields of graph processing and relational learning requires datasets to work with and WikiData is the largest open source knowledge graph involving more than fifty millions entities. It is larger than needed in many cases and even too large to be processed easily but it is still a goldmine of relevant facts and subgraphs. Using this graph is time consuming and prone to task specific tuning which can affect reproducibility of results. Providing a unified framework to extract topic-specific subgraphs solves this problem and allows researchers to evaluate algorithms on common datasets. This paper presents various topic-specific subgraphs of WikiData along with the generic Python code used to extract them. These datasets can help develop new methods of knowledge graph processing and relational learning.

  Access Paper or Ask Questions

Winning on the Merits: The Joint Effects of Content and Style on Debate Outcomes

May 15, 2017
Lu Wang, Nick Beauchamp, Sarah Shugars, Kechen Qin

Debate and deliberation play essential roles in politics and government, but most models presume that debates are won mainly via superior style or agenda control. Ideally, however, debates would be won on the merits, as a function of which side has the stronger arguments. We propose a predictive model of debate that estimates the effects of linguistic features and the latent persuasive strengths of different topics, as well as the interactions between the two. Using a dataset of 118 Oxford-style debates, our model's combination of content (as latent topics) and style (as linguistic features) allows us to predict audience-adjudicated winners with 74% accuracy, significantly outperforming linguistic features alone (66%). Our model finds that winning sides employ stronger arguments, and allows us to identify the linguistic features associated with strong or weak arguments.

* Accepted by TACL, 14 pages 

  Access Paper or Ask Questions

Using Multiple Samples to Learn Mixture Models

Nov 28, 2013
Jason D Lee, Ran Gilad-Bachrach, Rich Caruana

In the mixture models problem it is assumed that there are $K$ distributions $\theta_{1},\ldots,\theta_{K}$ and one gets to observe a sample from a mixture of these distributions with unknown coefficients. The goal is to associate instances with their generating distributions, or to identify the parameters of the hidden distributions. In this work we make the assumption that we have access to several samples drawn from the same $K$ underlying distributions, but with different mixing weights. As with topic modeling, having multiple samples is often a reasonable assumption. Instead of pooling the data into one sample, we prove that it is possible to use the differences between the samples to better recover the underlying structure. We present algorithms that recover the underlying structure under milder assumptions than the current state of art when either the dimensionality or the separation is high. The methods, when applied to topic modeling, allow generalization to words not present in the training data.

* Published in Neural Information Processing Systems (NIPS) 2013 

  Access Paper or Ask Questions

Fine-grained Financial Opinion Mining: A Survey and Research Agenda

May 20, 2020
Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen

Opinion mining is a prevalent research issue in many domains. In the financial domain, however, it is still in the early stages. Most of the researches on this topic only focus on the coarse-grained market sentiment analysis, i.e., 2-way classification for bullish/bearish. Thanks to the recent financial technology (FinTech) development, some interdisciplinary researchers start to involve in the in-depth analysis of investors' opinions. In this position paper, we first define the financial opinions from both coarse-grained and fine-grained points of views, and then provide an overview on the issues already tackled. In addition to listing research issues of the existing topics, we further propose a road map of fine-grained financial opinion mining for future researches, and point out several challenges yet to explore. Moreover, we provide possible directions to deal with the proposed research issues.

  Access Paper or Ask Questions

Applications of deep learning in stock market prediction: recent progress

Feb 29, 2020
Weiwei Jiang

Stock market prediction has been a classical yet challenging problem, with the attention from both economists and computer scientists. With the purpose of building an effective prediction model, both linear and machine learning tools have been explored for the past couple of decades. Lately, deep learning models have been introduced as new frontiers for this topic and the rapid development is too fast to catch up. Hence, our motivation for this survey is to give a latest review of recent works on deep learning models for stock market prediction. We not only category the different data sources, various neural network structures, and common used evaluation metrics, but also the implementation and reproducibility. Our goal is to help the interested researchers to synchronize with the latest progress and also help them to easily reproduce the previous studies as baselines. Base on the summary, we also highlight some future research directions in this topic.

* 97 pages, 12 figures, 14 tables 

  Access Paper or Ask Questions

Intweetive Text Summarization

Jan 16, 2020
Jean Valère Cossu, Juan-Manuel Torres-Moreno, Eric SanJuan, Marc El-Bèze

The amount of user generated contents from various social medias allows analyst to handle a wide view of conversations on several topics related to their business. Nevertheless keeping up-to-date with this amount of information is not humanly feasible. Automatic Summarization then provides an interesting mean to digest the dynamics and the mass volume of contents. In this paper, we address the issue of tweets summarization which remains scarcely explored. We propose to automatically generated summaries of Micro-Blogs conversations dealing with public figures E-Reputation. These summaries are generated using key-word queries or sample tweet and offer a focused view of the whole Micro-Blog network. Since state-of-the-art is lacking on this point we conduct and evaluate our experiments over the multilingual CLEF RepLab Topic-Detection dataset according to an experimental evaluation process.

* International Journal of Computational Linguistics and Applications vol. 7, no. 1, 2016, pp. 67-83 
* 8 pages, 4 tables 

  Access Paper or Ask Questions

Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

Jul 27, 2019
Tamar Lavee, Matan Orbach, Lili Kotlerman, Yoav Kantor, Shai Gretz, Lena Dankin, Shachar Mirkin, Michal Jacovi, Yonatan Bilu, Ranit Aharonov, Noam Slonim

Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of $400$ speeches in English discussing $200$ controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.

* 6th Argument Mining Workshop @ ACL 2019 

  Access Paper or Ask Questions

Coherent Comment Generation for Chinese Articles with a Graph-to-Sequence Model

Jun 04, 2019
Wei Li, Jingjing Xu, Yancheng He, Shengli Yan, Yunfang Wu, Xu sun

Automatic article commenting is helpful in encouraging user engagement and interaction on online news platforms. However, the news documents are usually too long for traditional encoder-decoder based models, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to understand the story. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.

* Accepted by ACL 2019 

  Access Paper or Ask Questions

Attribute Alignment: Controlling Text Generation from Pre-trained Language Models

Mar 20, 2021
Dian Yu, Kenji Sagae, Zhou Yu

Large language models benefit from training with a large amount of unlabeled text, which gives them increasingly fluent and diverse generation capabilities. However, using these models for text generation that takes into account target attributes, such as sentiment polarity or specific topics, remains a challenge. We propose a simple and flexible method for controlling text generation by aligning disentangled attribute representations. In contrast to recent efforts on training a discriminator to perturb the token level distribution for an attribute, we use the same data to learn an alignment function to guide the pre-trained, non-controlled language model to generate texts with the target attribute without changing the original language model parameters. We evaluate our method on sentiment- and topic-controlled generation, and show large performance gains over previous methods while retaining fluency and diversity.

  Access Paper or Ask Questions