What is Sentiment Analysis? Sentiment analysis is the process of determining the sentiment of a piece of text, such as a tweet or a review.
Papers and Code
Dec 03, 2024
Abstract:In social media, neural network models have been applied to hate speech detection, sentiment analysis, etc., but neural network models are susceptible to adversarial attacks. For instance, in a text classification task, the attacker elaborately introduces perturbations to the original texts that hardly alter the original semantics in order to trick the model into making different predictions. By studying textual adversarial attack methods, the robustness of language models can be evaluated and then improved. Currently, most of the research in this field focuses on English, and there is also a certain amount of research on Chinese. However, there is little research targeting Chinese minority languages. With the rapid development of artificial intelligence technology and the emergence of Chinese minority language models, textual adversarial attacks become a new challenge for the information processing of Chinese minority languages. In response to this situation, we propose a multi-granularity Tibetan textual adversarial attack method based on masked language models called TSTricker. We utilize the masked language models to generate candidate substitution syllables or words, adopt the scoring mechanism to determine the substitution order, and then conduct the attack method on several fine-tuned victim models. The experimental results show that TSTricker reduces the accuracy of the classification models by more than 28.70% and makes the classification models change the predictions of more than 90.60% of the samples, which has an evidently higher attack effect than the baseline method.
* Companion Proceedings of the ACM Web Conference 2024
* Revised Version; Accepted at WWW 2024 Workshop on SocialNLP
Via
Dec 02, 2024
Abstract:As a fine-grained task, multimodal aspect-based sentiment analysis (MABSA) mainly focuses on identifying aspect-level sentiment information in the text-image pair. However, we observe that it is difficult to recognize the sentiment of aspects in low-quality samples, such as those with low-resolution images that tend to contain noise. And in the real world, the quality of data usually varies for different samples, such noise is called data uncertainty. But previous works for the MABSA task treat different quality samples with the same importance and ignored the influence of data uncertainty. In this paper, we propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA, which weighted the loss of different samples by the data quality and difficulty. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance, thus enabling the model to pay more attention to high-quality and challenging samples. Extensive experiments show that our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset. Further analysis demonstrates the effectiveness of the quality assessment strategy.
Via
Dec 02, 2024
Abstract:Deep learning has achieved remarkable success in processing and managing unstructured data. However, its "black box" nature imposes significant limitations, particularly in sensitive application domains. While existing interpretable machine learning methods address some of these issues, they often fail to adequately consider feature correlations and provide insufficient evaluation of model decision paths. To overcome these challenges, this paper introduces Real Explainer (RealExp), an interpretability computation method that decouples the Shapley Value into individual feature importance and feature correlation importance. By incorporating feature similarity computations, RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions, leading to more reliable and nuanced explanations. Additionally, this paper proposes a novel interpretability evaluation criterion focused on elucidating the decision paths of deep learning models, going beyond traditional accuracy-based metrics. Experimental validations on two unstructured data tasks -- image classification and text sentiment analysis -- demonstrate that RealExp significantly outperforms existing methods in interpretability. Case studies further illustrate its practical value: in image classification, RealExp aids in selecting suitable pre-trained models for specific tasks from an interpretability perspective; in text classification, it enables the optimization of models and approximates the performance of a fine-tuned GPT-Ada model using traditional bag-of-words approaches.
* 10 pages
Via
Dec 01, 2024
Abstract:Recently, generative pre-training based models have demonstrated remarkable results on Aspect-based Sentiment Analysis (ABSA) task. However, previous works overemphasize crafting various templates to paraphrase training targets for enhanced decoding, ignoring the internal optimizations on generative models. Despite notable results achieved by these target-oriented optimization methods, they struggle with the complicated long texts since the implicit long-distance relation, e.g., aspect-opinion relation, is difficult to extract under the position embedding mechanism in generative models. Thus, in this paper, we first clarify the causes of the problem and introduce two sequence optimization strategies: the rule-based static optimization and the score-based dynamic optimization. The rule-based approach relies on handcraft priority of dependency relation to reorder the context, while the score-based algorithm dynamically regulates the contextual sequence by calculating word position scores using neural network. Based on the dynamic optimization structure, we further propose a unified Prompt-based Generative Sequence Optimization network (named PGSO), which jointly optimizes the training target as well as the generative model. Specifically, PGSO contains two components, namely, prompt construction and sequence regulator. The former constructs a task-specific prompt based on unsupervised training objects to fully utilize the pre-trained model. The latter jointly leverages semantic, syntactic and original-sequence information to dynamically regulate contextual sequence. Our experiments conducted on four ABSA tasks across multiple benchmarks indicate that PGSO outperforms state-of-the-art methods, with an average improvement of 3.52% in F1 score.
Via
Dec 02, 2024
Abstract:Social media platforms, particularly Reddit's r/Epilepsy community, offer a unique perspective into the experiences of individuals with epilepsy (PWE) and their caregivers. This study analyzes 57k posts and 533k comments to explore key themes across demographics such as age, gender, and relationships. Our findings highlight significant discussions on epilepsy-related challenges, including depression (with 39.75\% of posts indicating severe symptoms), driving restrictions, workplace concerns, and pregnancy-related issues in women with epilepsy. We introduce a novel engagement metric, F(P), which incorporates post length, sentiment scores, and readability to quantify community interaction. This analysis underscores the importance of integrated care addressing both neurological and mental health challenges faced by PWE. The insights from this study inform strategies for targeted support and awareness interventions.
Via
Nov 29, 2024
Abstract:Sentiment analysis (SA) is a process of identifying the emotional tone or polarity within a given text and aims to uncover the user's complex emotions and inner feelings. While sentiment analysis has been extensively studied for languages like English, research in Bengali, remains limited, particularly for fine-grained sentiment categorization. This work aims to connect this gap by developing a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a dataset from scratch, comprising over 15,000 manually labeled reviews. Next, we constructed a Lexicon Data Dictionary, assigning polarity scores to the reviews. We developed a novel rule based algorithm Bangla Sentiment Polarity Score (BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. To assess the performance of this method, we evaluated the classified sentiments using BanglaBERT, a pre-trained transformer-based language model. We also performed sentiment classification directly with BanglaBERT on the original data and evaluated this model's results. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories. The results of our study emphasize the value and effectiveness of combining rule-based and pre-trained language model approaches for enhanced sentiment analysis in Bengali and suggest pathways for future research and application in languages with similar linguistic complexities.
* 13 pages, 12 figures
Via
Nov 30, 2024
Abstract:In the contemporary educational landscape, the advent of Generative Artificial Intelligence (AI) presents unprecedented opportunities for personalised learning, fundamentally challenging the traditional paradigms of education. This research explores the emerging trend where high school students, empowered by tailored educational experiences provided by Generative AI, opt to forgo traditional university degrees to pursue entrepreneurial ventures at a younger age. To understand and predict the future of education in the age of Generative AI, we employ a comprehensive methodology to analyse social media data. Our approach includes sentiment analysis to gauge public opinion, topic modelling to identify key themes and emerging trends, and user demographic analysis to understand the engagement of different age groups and regions. We also perform influencer analysis to identify key figures shaping the discourse and engagement metrics to measure the level of interest and interaction with AI-related educational content. Content analysis helps us to determine the types of content being shared and the prevalent narratives, while hashtag analysis reveals the connectivity of discussions. The temporal analysis tracks changes over time and identifies event-based spikes in discussions. The insights derived from this analysis include the acceptance and adoption of Generative AI in education, its impact on traditional education models, the influence on students' entrepreneurial ambitions, and the educational outcomes associated with AI-driven personalised learning. Additionally, we explore public sentiment towards policies and regulations and use predictive modelling to forecast future trends. This comprehensive social media analysis provides a nuanced understanding of the evolving educational landscape, offering valuable perspectives on the role of Generative AI in shaping the future of education.
Via
Nov 29, 2024
Abstract:Accurate prediction of stock market trends is crucial for informed investment decisions and effective portfolio management, ultimately leading to enhanced wealth creation and risk mitigation. This study proposes a novel approach for predicting stock prices in the stock market by integrating Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, using sentiment analysis of social network data and candlestick data (price). The proposed methodology consists of two primary components: sentiment analysis of social network and candlestick data. By amalgamating candlestick data with insights gleaned from Twitter, this approach facilitates a more detailed and accurate examination of market trends and patterns, ultimately leading to more effective stock price predictions. Additionally, a Random Forest algorithm is used to classify tweets as either positive or negative, allowing for a more subtle and informed assessment of market sentiment. This study uses CNN and LSTM networks to predict stock prices. The CNN extracts short-term features, while the LSTM models long-term dependencies. The integration of both networks enables a more comprehensive analysis of market trends and patterns, leading to more accurate stock price predictions.
Via
Nov 27, 2024
Abstract:Thirteen years after the Fukushima Daiichi nuclear power plant accident, Japan's nuclear energy accounts for only approximately 6% of electricity production, as most nuclear plants remain shut down. To revitalize the nuclear industry and achieve sustainable development goals, effective communication with Japanese citizens, grounded in an accurate understanding of public sentiment, is of paramount importance. While nationwide surveys have traditionally been used to gauge public views, the rise of social media in recent years has provided a promising new avenue for understanding public sentiment. To explore domestic sentiment on nuclear energy-related issues expressed online, we analyzed the content and comments of over 3,000 YouTube videos covering topics related to nuclear energy. Topic modeling was used to extract the main topics from the videos, and sentiment analysis with large language models classified user sentiments towards each topic. Additionally, word co-occurrence network analysis was performed to examine the shift in online discussions during August and September 2023 regarding the release of treated water. Overall, our results provide valuable insights into the online discourse on nuclear energy and contribute to a more comprehensive understanding of public sentiment in Japan.
* 15 pages, 9 figures, 4 tables
Via
Nov 26, 2024
Abstract:Low-resource languages face significant challenges due to the lack of sufficient linguistic data, resources, and tools for tasks such as supervised learning, annotation, and classification. This shortage hinders the development of accurate models and datasets, making it difficult to perform critical NLP tasks like sentiment analysis or hate speech detection. To bridge this gap, Large Language Models (LLMs) present an opportunity for potential annotators, capable of generating datasets and resources for these underrepresented languages. In this paper, we focus on Marathi, a low-resource language, and evaluate the performance of both closed-source and open-source LLMs as annotators. We assess models such as GPT-4o and Gemini 1.0 Pro, Gemma 2 (2B and 9B), and Llama 3.1 (8B) on classification tasks including sentiment analysis, news classification, and hate speech detection. Our findings reveal that while LLMs excel in annotation tasks for high-resource languages like English, they still fall short when applied to Marathi. Even advanced closed models like Gemini and GPT underperform in comparison to BERT-based baselines, highlighting the limitations of LLMs as annotators for low-resource languages.
Via