Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox


COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes

Jul 14, 2020
Raj Kumar Gupta, Ajay Vishwanath, Yinping Yang


Share this with someone who'll enjoy it:


This resource paper describes a large dataset covering over 63 million coronavirus-related Twitter posts from more than 13 million unique users since 28 January to 1 July 2020. As strong concerns and emotions are expressed in the tweets, we analyzed the tweets content using natural language processing techniques and machine-learning based algorithms, and inferred seventeen latent semantic attributes associated with each tweet, including 1) ten attributes indicating the tweet's relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. To illustrate how the dataset can be used, we present descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss possible applications in communication, psychology, public health, economics and epidemiology.

* 20 pages, 5 figures, 9 tables 


   Access Paper Source



Share this with someone who'll enjoy it: