The Coronavirus pandemic has taken the world by storm as also the social media. As the awareness about the ailment increased, so did messages, videos and posts acknowledging its presence. The social networking site, Twitter, demonstrated similar effect with the number of posts related to coronavirus showing an unprecedented growth in a very short span of time. This paper presents a statistical analysis of the twitter messages related to this disease posted since January 2020. Two types of empirical studies have been performed. The first is on word frequency and the second on sentiments of the individual tweet messages. Inspection of the word frequency is useful in characterizing the patterns or trends in the words used on the site. This would also reflect on the psychology of the twitter users at this critical juncture. Unigram, bigram and trigram frequencies have been modeled by power law distribution. The results have been validated by Sum of Square Error (SSE), R2 and Root Mean Square Error (RMSE). High values of R2 and low values of SSE and RMSE lay the grounds for the goodness of fit of this model. Sentiment analysis has been conducted to understand the general attitudes of the twitter users at this time. Both tweets by general public and WHO were part of the corpus. The results showed that the majority of the tweets had a positive polarity and only about 15% were negative.
As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention in recent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodal alignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify finegrained aspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-Language Pre-training framework for MABSA (VLPMABSA), which is a unified multimodal encoder-decoder architecture for all the pretraining and downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodal modalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pretraining task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.
Social media data can be a very salient source of information during crises. User-generated messages provide a window into people's minds during such times, allowing us insights about their moods and opinions. Due to the vast amounts of such messages, a large-scale analysis of population-wide developments becomes possible. In this paper, we analyze Twitter messages (tweets) collected during the first months of the COVID-19 pandemic in Europe with regard to their sentiment. This is implemented with a neural network for sentiment analysis using multilingual sentence embeddings. We separate the results by country of origin, and correlate their temporal development with events in those countries. This allows us to study the effect of the situation on people's moods. We see, for example, that lockdown announcements correlate with a deterioration of mood in almost all surveyed countries, which recovers within a short time span.
Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by codemixing between languages, especially in large multilingual societies like India. Traditional methods for learning semantic NLP tasks have long relied on end to end task specific training, requiring expensive data creation process, even more so for deep learning methods. This challenge is even more severe for resource scarce texts like codemixed language pairs, with lack of well learnt representations as model priors, and task specific datasets can be few and small in quantities to efficiently exploit recent deep learning approaches. To address above challenges, we introduce curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigate various training strategies for enhancing model performance. Our method outperforms the state of the art methods for Hi-En codemixed sentiment analysis by 3.31% accuracy, and also shows better model robustness in terms of convergence, and variance in test performance.
This paper describes a large dataset covering over 63 million coronavirus-related Twitter posts from more than 13 million unique users since 28 January to 1 July 2020. As strong concerns and emotions are expressed in the tweets, we analysed the tweets content using natural language processing techniques and machine-learning based algorithms, and inferred seventeen latent semantic attributes associated with each tweet, including 1) ten attributes indicating the tweet's relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. To illustrate how the dataset can be used, we present descriptive statistics around the topics, sentiments and emotions attributes and their temporal distributions, and discuss possible usage in communication, psychology, public health, economics and epidemiology research.
Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text given the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads. The aim of this research is to identify which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., words $n$-grams), and tokens weighting schemes impact the most the accuracy of a classifier (Support Vector Machine) trained on two Spanish corpus. The methodology used is to exhaustively analyze all the combinations of the text transformations and their respective parameters to find out which characteristics the best performing classifiers have in common. Furthermore, among the different text transformations studied, we introduce a novel approach based on the combination of word based $n$-grams and character based $q$-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word based combination by $11.17\%$ and $5.62\%$ on the INEGI and TASS'15 dataset, respectively.
The most important goal of customer services is to keep the customer satisfied. However, service resources are always limited and must be prioritized. Therefore, it is important to identify customers who potentially become unsatisfied and might lead to escalations. Today this prioritization of customers is often done manually. Data science on IoT data (esp. log data) for machine health monitoring, as well as analytics on enterprise data for customer relationship management (CRM) have mainly been researched and applied independently. In this paper, we present a framework for a data-driven decision support system which combines IoT and enterprise data to model customer sentiment. Such decision support systems can help to prioritize customers and service resources to effectively troubleshoot problems or even avoid them. The framework is applied in a real-world case study with a major medical device manufacturer. This includes a fully automated and interpretable machine learning pipeline designed to meet the requirements defined with domain experts and end users. The overall framework is currently deployed, learns and evaluates predictive models from terabytes of IoT and enterprise data to actively monitor the customer sentiment for a fleet of thousands of high-end medical devices. Furthermore, we provide an anonymized industrial benchmark dataset for the research community.
One of the pillars to build a country's economy is the stock market. Over the years, people are investing in stock markets to earn as much profit as possible from the amount of money that they possess. Hence, it is vital to have a prediction model which can accurately predict future stock prices. With the help of machine learning, it is not an impossible task as the various machine learning techniques if modeled properly may be able to provide the best prediction values. This would enable the investors to decide whether to buy, sell or hold the share. The aim of this paper is to predict the future of the financial stocks of a company with improved accuracy. In this paper, we have proposed the use of historical as well as sentiment data to efficiently predict stock prices by applying LSTM. It has been found by analyzing the existing research in the area of sentiment analysis that there is a strong correlation between the movement of stock prices and the publication of news articles. Therefore, in this paper, we have integrated these factors to predict the stock prices more accurately.