Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Sentiment": models, code, and papers

MojiTalk: Generating Emotional Responses at Scale

May 12, 2018
Xianda Zhou, William Yang Wang

Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.

  Access Paper or Ask Questions

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

Jan 16, 2014
Sajib Dasgupta, Vincent Ng

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment. Without knowing the users intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the users intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.

* Journal Of Artificial Intelligence Research, Volume 39, pages 581-632, 2010 

  Access Paper or Ask Questions

Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica

Sep 06, 2021
Shirley Anugrah Hayati, Dongyeop Kang, Lyle Ungar

People convey their intention and attitude through linguistic styles of the text that they write. In this study, we investigate lexicon usages across styles throughout two lenses: human perception and machine word importance, since words differ in the strength of the stylistic cues that they provide. To collect labels of human perception, we curate a new dataset, Hummingbird, on top of benchmarking style datasets. We have crowd workers highlight the representative words in the text that makes them think the text has the following styles: politeness, sentiment, offensiveness, and five emotion types. We then compare these human word labels with word importance derived from a popular fine-tuned style classifier like BERT. Our results show that the BERT often finds content words not relevant to the target style as important words used in style prediction, but humans do not perceive the same way even though for some styles (e.g., positive sentiment and joy) human- and machine-identified words share significant overlap for some styles.

* Accepted at EMNLP 2021 Main Conference 

  Access Paper or Ask Questions

Analyzing Team Performance with Embeddings from Multiparty Dialogues

Jan 23, 2021
Ayesha Enayet, Gita Sukthankar

Good communication is indubitably the foundation of effective teamwork. Over time teams develop their own communication styles and often exhibit entrainment, a conversational phenomena in which humans synchronize their linguistic choices. This paper examines the problem of predicting team performance from embeddings learned from multiparty dialogues such that teams with similar conflict scores lie close to one another in vector space. Embeddings were extracted from three types of features: 1) dialogue acts 2) sentiment polarity 3) syntactic entrainment. Although all of these features can be used to effectively predict team performance, their utility varies by the teamwork phase. We separate the dialogues of players playing a cooperative game into stages: 1) early (knowledge building) 2) middle (problem-solving) and 3) late (culmination). Unlike syntactic entrainment, both dialogue act and sentiment embeddings are effective for classifying team performance, even during the initial phase. This finding has potential ramifications for the development of conversational agents that facilitate teaming.

* To be published in the 15th IEEE International Conference on Semantic Computing 

  Access Paper or Ask Questions

Image Inspired Poetry Generation in XiaoIce

Aug 09, 2018
Wen-Feng Cheng, Chao-Chung Wu, Ruihua Song, Jianlong Fu, Xing Xie, Jian-Yun Nie

Vision is a common source of inspiration for poetry. The objects and the sentimental imprints that one perceives from an image may lead to various feelings depending on the reader. In this paper, we present a system of poetry generation from images to mimic the process. Given an image, we first extract a few keywords representing objects and sentiments perceived from the image. These keywords are then expanded to related ones based on their associations in human written poems. Finally, verses are generated gradually from the keywords using recurrent neural networks trained on existing poems. Our approach is evaluated by human assessors and compared to other generation baselines. The results show that our method can generate poems that are more artistic than the baseline methods. This is one of the few attempts to generate poetry from images. By deploying our proposed approach, XiaoIce has already generated more than 12 million poems for users since its release in July 2017. A book of its poems has been published by Cheers Publishing, which claimed that the book is the first-ever poetry collection written by an AI in human history.

  Access Paper or Ask Questions

On the impact of publicly available news and information transfer to financial markets

Oct 22, 2020
Metod Jazbec, Barna Pásztor, Felix Faltings, Nino Antulov-Fantulin, Petter N. Kolm

We quantify the propagation and absorption of large-scale publicly available news articles from the World Wide Web to financial markets. To extract publicly available information, we use the news archives from the Common Crawl, a nonprofit organization that crawls a large part of the web. We develop a processing pipeline to identify news articles associated with the constituent companies in the S\&P 500 index, an equity market index that measures the stock performance of U.S. companies. Using machine learning techniques, we extract sentiment scores from the Common Crawl News data and employ tools from information theory to quantify the information transfer from public news articles to the U.S. stock market. Furthermore, we analyze and quantify the economic significance of the news-based information with a simple sentiment-based portfolio trading strategy. Our findings provides support for that information in publicly available news on the World Wide Web has a statistically and economically significant impact on events in financial markets.

  Access Paper or Ask Questions

Extracting Feelings of People Regarding COVID-19 by Social Network Mining

Oct 12, 2021
Hamed Vahdat-Nejad, Fatemeh Salmani, Mahdi Hajiabadi, Faezeh Azizi, Sajedeh Abbasi, Mohadese Jamalian, Reyhane Mosafer, Hamideh Hajiabadi

In 2020, COVID-19 became the chief concern of the world and is still reflected widely in all social networks. Each day, users post millions of tweets and comments on this subject, which contain significant implicit information about the public opinion. In this regard, a dataset of COVID-related tweets in English language is collected, which consists of more than two million tweets from March 23 to June 23 of 2020 to extract the feelings of the people in various countries in the early stages of this outbreak. To this end, first, we use a lexicon-based approach in conjunction with the GeoNames geographic database to label the tweets with their locations. Next, a method based on the recently introduced and widely cited RoBERTa model is proposed to analyze their sentimental content. After that, the trend graphs of the frequency of tweets as well as sentiments are produced for the world and the nations that were more engaged with COVID-19. Graph analysis shows that the frequency graphs of the tweets for the majority of nations are significantly correlated with the official statistics of the daily afflicted in them. Moreover, several implicit knowledge is extracted and discussed.

  Access Paper or Ask Questions

SmokEng: Towards Fine-grained Classification of Tobacco-related Social Media Text

Oct 12, 2019
Kartikey Pant, Venkata Himakar Yanamandra, Alok Debnath, Radhika Mamidi

Contemporary datasets on tobacco consumption focus on one of two topics, either public health mentions and disease surveillance, or sentiment analysis on topical tobacco products and services. However, two primary considerations are not accounted for, the language of the demographic affected and a combination of the topics mentioned above in a fine-grained classification mechanism. In this paper, we create a dataset of 3144 tweets, which are selected based on the presence of colloquial slang related to smoking and analyze it based on the semantics of the tweet. Each class is created and annotated based on the content of the tweets such that further hierarchical methods can be easily applied. Further, we prove the efficacy of standard text classification methods on this dataset, by designing experiments which do both binary as well as multi-class classification. Our experiments tackle the identification of either a specific topic (such as tobacco product promotion), a general mention (cigarettes and related products) or a more fine-grained classification. This methodology paves the way for further analysis, such as understanding sentiment or style, which makes this dataset a vital contribution to both disease surveillance and tobacco use research.

* Accepted at the Workshop on Noisy User-generated Text (W-NUT) at EMNLP-IJCNLP 2019 

  Access Paper or Ask Questions

Persona-Aware Tips Generation

Mar 13, 2019
Piji Li, Zihao Wang, Lidong Bing, Wai Lam

Tips, as a compacted and concise form of reviews, were paid less attention by researchers. In this paper, we investigate the task of tips generation by considering the `persona' information which captures the intrinsic language style of the users or the different characteristics of the product items. In order to exploit the persona information, we propose a framework based on adversarial variational auto-encoders (aVAE) for persona modeling from the historical tips and reviews of users and items. The latent variables from aVAE are regarded as persona embeddings. Besides representing persona using the latent embeddings, we design a persona memory for storing the persona related words for users and items. Pointer Network is used to retrieve persona wordings from the memory when generating tips. Moreover, the persona embeddings are used as latent factors by a rating prediction component to predict the sentiment of a user over an item. Finally, the persona embeddings and the sentiment information are incorporated into a recurrent neural networks based tips generation component. Extensive experimental results are reported and discussed to elaborate the peculiarities of our framework.

* Accepted to WWW'2019, 11 pages 

  Access Paper or Ask Questions

Enhancing Decision Making Capacity in Tourism Domain Using Social Media Analytics

Dec 19, 2018
Supun Abeysinghe, Isura Manchanayake, Chamod Samarajeewa, Prabod Rathnayaka, Malaka J. Walpola, Rashmika Nawaratne, Tharindu Bandaragoda, Damminda Alahakoon

Social media has gained an immense popularity over the last decade. People tend to express opinions about their daily encounters on social media freely. These daily encounters include the places they traveled, hotels or restaurants they have tried and aspects related to tourism in general. Since people usually express their true experiences on social media, the expressed opinions contain valuable information that can be used to generate business value and aid in decision-making processes. Due to the large volume of data, it is not a feasible task to manually go through each and every item and extract the information. Hence, we propose a social media analytics platform which has the capability to identify discussion pathways and aspects with their corresponding sentiment and deeper emotions using machine learning techniques and a visualization tool which shows the extracted insights in a comprehensible and concise manner. Identified topic pathways and aspects will give a decision maker some insight into what are the most discussed topics about the entity whereas associated sentiments and emotions will help to identify the feedback.

* To Appear in Proceedings of International Conference on Advances in ICT for Emerging Regions, Colombo, LK 

  Access Paper or Ask Questions