Public response to COVID-19 vaccines is the key success factor to control the COVID-19 pandemic. To understand the public response, there is a need to explore public opinion. Traditional surveys are expensive and time-consuming, address limited health topics, and obtain small-scale data. Twitter can provide a great opportunity to understand public opinion regarding COVID-19 vaccines. The current study proposes an approach using computational and human coding methods to collect and analyze a large number of tweets to provide a wider perspective on the COVID-19 vaccine. This study identifies the sentiment of tweets and their temporal trend, discovers major topics, compares topics of negative and non-negative tweets, and discloses top topics of negative and non-negative tweets. Our findings show that the negative sentiment regarding the COVID-19 vaccine had a decreasing trend between November 2020 and February 2021. We found Twitter users have discussed a wide range of topics from vaccination sites to the 2020 U.S. election between November 2020 and February 2021. The findings show that there was a significant difference between negative and non-negative tweets regarding the weight of most topics. Our results also indicate that the negative and non-negative tweets had different topic priorities and focuses.
NLP research has attained high performances in abusive language detection as a supervised classification task. While in research settings, training and test datasets are usually obtained from similar data samples, in practice systems are often applied on data that are different from the training set in topic and class distributions. Also, the ambiguity in class definitions inherited in this task aggravates the discrepancies between source and target datasets. We explore the topic bias and the task formulation bias in cross-dataset generalization. We show that the benign examples in the Wikipedia Detox dataset are biased towards platform-specific topics. We identify these examples using unsupervised topic modeling and manual inspection of topics' keywords. Removing these topics increases cross-dataset generalization, without reducing in-domain classification performance. For a robust dataset design, we suggest applying inexpensive unsupervised methods to inspect the collected data and downsize the non-generalizable content before manually annotating for class labels.
In the last decade, political debates have progressively shifted to social media. Rhetorical devices employed by online actors and factions that operate in these debating arenas can be captured and analysed to conduct a statistical reading of societal controversies and their argumentation dynamics. In this paper, we propose a five-step methodology, to extract, categorize and explore the latent argumentation structures of online debates. Using Twitter data about a "no-deal" Brexit, we focus on the expected effects in case of materialisation of this event. First, we extract cause-effect claims contained in tweets using RegEx that exploit verbs related to Creation, Destruction and Causation. Second, we categorise extracted "no-deal" effects using a Structural Topic Model estimated on unigrams and bigrams. Third, we select controversial effect topics and explore within-topic argumentation differences between self-declared partisan user factions. We hence type topics using estimated covariate effects on topic propensities, then, using the topics correlation network, we study the topological structure of the debate to identify coherent topical constellations. Finally, we analyse the debate time dynamics and infer lead/follow relations among factions. Results show that the proposed methodology can be employed to perform a statistical rhetorics analysis of debates, and map the architecture of controversies across time. In particular, the "no-deal" Brexit debate is shown to have an assortative argumentation structure heavily characterized by factional constellations of arguments, as well as by polarized narrative frames invoked through verbs related to Creation and Destruction. Our findings highlight the benefits of implementing a systemic approach to the analysis of debates, which allows the unveiling of topical and factional dependencies between arguments employed in online debates.
Novel information retrieval methods to identify citations relevant to a clinical topic can overcome the knowledge gap existing between the primary literature (MEDLINE) and online clinical knowledge resources such as UpToDate. Searching the MEDLINE database directly or with query expansion methods returns a large number of citations that are not relevant to the query. The current study presents a citation retrieval system that retrieves citations for evidence-based clinical knowledge summarization. This approach combines query expansion, concept-based screening algorithm, and concept-based vector similarity. We also propose an information extraction framework for automated concept (Population, Intervention, Comparison, and Disease) extraction. We evaluated our proposed system on all topics (as queries) available from UpToDate for two diseases, heart failure (HF) and atrial fibrillation (AFib). The system achieved an overall F-score of 41.2% on HF topics and 42.4% on AFib topics on a gold standard of citations available in UpToDate. This is significantly high when compared to a query-expansion based baseline (F-score of 1.3% on HF and 2.2% on AFib) and a system that uses query expansion with disease hyponyms and journal names, concept-based screening, and term-based vector similarity system (F-score of 37.5% on HF and 39.5% on AFib). Evaluating the system with top K relevant citations, where K is the number of citations in the gold standard achieved a much higher overall F-score of 69.9% on HF topics and 75.1% on AFib topics. In addition, the system retrieved up to 18 new relevant citations per topic when tested on ten HF and six AFib clinical topics.
Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
Social media is commonly used by the public during election campaigns to express their opinions regarding different issues. Among various social media channels, Twitter provides an efficient platform for researchers and politicians to explore public opinion regarding a wide range of topics such as economy and foreign policy. Current literature mainly focuses on analyzing the content of tweets without considering the gender of users. This research collects and analyzes a large number of tweets and uses computational, human coding, and statistical analyses to identify topics in more than 300,000 tweets posted during the 2020 U.S. presidential election and to compare female and male users regarding the average weight of the topics. Our findings are based upon a wide range of topics, such as tax, climate change, and the COVID-19 pandemic. Out of the topics, there exists a significant difference between female and male users for more than 70% of topics. Our research approach can inform studies in the areas of informatics, politics, and communication, and it can be used by political campaigns to obtain a gender-based understanding of public opinion.
To what extent user's stance towards a given topic could be inferred? Most of the studies on stance detection have focused on analysing user's posts on a given topic to predict the stance. However, the stance in social media can be inferred from a mixture of signals that might reflect user's beliefs including posts and online interactions. This paper examines various online features of users to detect their stance towards different topics. We compare multiple set of features, including on-topic content, network interactions, user's preferences, and online network connections. Our objective is to understand the online signals that can reveal the users' stance. Experimentation is applied on tweets dataset from the SemEval stance detection task, which covers five topics. Results show that stance of a user can be detected with multiple signals of user's online activity, including their posts on the topic, the network they interact with or follow, the websites they visit, and the content they like. The performance of the stance modelling using different network features are comparable with the state-of-the-art reported model that used textual content only. In addition, combining network and content features leads to the highest reported performance to date on the SemEval dataset with F-measure of 72.49%. We further present an extensive analysis to show how these different set of features can reveal stance. Our findings have distinct privacy implications, where they highlight that stance is strongly embedded in user's online social network that, in principle, individuals can be profiled from their interactions and connections even when they do not post about the topic.
Improving the energy efficiency of mobile applications is a topic that has gained a lot of attention recently. It has been addressed in a number of ways such as identifying energy bugs and developing a catalog of energy patterns. Previous work shows that users discuss the battery-related issues (energy inefficiency or energy consumption) of the apps in their reviews. However, there is no work that addresses the automatic extraction of battery-related issues from users' feedback. In this paper, we report on a visualization tool that is developed to empirically study machine learning algorithms and text features to automatically identify the energy consumption specific reviews with the highest accuracy. Other than the common machine learning algorithms, we utilize deep learning models with different word embeddings to compare the results. Furthermore, to help the developers extract the main topics that are discussed in the reviews, two states of the art topic modeling algorithms are applied. The visualizations of the topics represent the keywords that are extracted for each topic along with a comparison with the results of string matching. The developed web-browser based interactive visualization tool is a novel framework developed with the intention of giving the app developers insights about running time and accuracy of machine learning and deep learning models as well as extracted topics. The tool makes it easier for the developers to traverse through the extensive result set generated by the text classification and topic modeling algorithms. The dynamic-data structure used for the tool stores the baseline-results of the discussed approaches and is updated when applied on new datasets. The tool is open-sourced to replicate the research results.
Since the fatal shooting of 17-year old Black teenager Trayvon Martin in February 2012 by a White neighborhood watchman, George Zimmerman in Sanford, Florida, there has been a significant increase in digital activism addressing police-brutality related and racially-motivated incidents in the United States. In this work, we administer an innovative study of digital activism by exploiting social media as an authoritative tool to examine and analyze the linguistic cues and thematic relationships in these three mediums. We conduct a multi-level text analysis on 36,984,559 tweets to investigate users' behaviors to examine the language used and understand the impact of digital activism on social media within each social movement on a sentence-level, word-level, and topic-level. Our results show that excessive use of racially-related or prejudicial hashtags were used by the counter protests which portray potential discriminatory tendencies. Consequently, our findings highlight that social activism done by Black Lives Matter activists does not diverge from the social issues and topics involving police-brutality related and racially-motivated killings of Black individuals due to the shape of its topical graph that topics and conversations encircling the largest component directly relate to the topic of Black Lives Matter. Finally, we see that both Blue Lives Matter and All Lives Matter movements depict a different directive, as the topics of Blue Lives Matter or All Lives Matter do not reside in the center. These findings suggest that topics and conversations within each social movement are skewed, random or possessed racially-related undertones, and thus, deviating from the prominent social injustice issues.