Sentiment analysis is the process of determining the sentiment of a piece of text, such as a tweet or a review.
Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. Starting from a small dataset manually crafted by an expert to reflect empathic behavior, we extended the conversations using two LLMs: ChatGPT and Gemini. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and humancentered methods in the development of emotionally competent agents.
Bias and stereotypes in language models can cause harm, especially in sensitive areas like content moderation and decision-making. This paper addresses bias and stereotype detection by exploring how jointly learning these tasks enhances model performance. We introduce StereoBias, a unique dataset labeled for bias and stereotype detection across five categories: religion, gender, socio-economic status, race, profession, and others, enabling a deeper study of their relationship. Our experiments compare encoder-only models and fine-tuned decoder-only models using QLoRA. While encoder-only models perform well, decoder-only models also show competitive results. Crucially, joint training on bias and stereotype detection significantly improves bias detection compared to training them separately. Additional experiments with sentiment analysis confirm that the improvements stem from the connection between bias and stereotypes, not multi-task learning alone. These findings highlight the value of leveraging stereotype information to build fairer and more effective AI systems.
In this paper, we investigate the transferability of pre-trained language models to low-resource Indonesian local languages through the task of sentiment analysis. We evaluate both zero-shot performance and adapter-based transfer on ten local languages using models of different types: a monolingual Indonesian BERT, multilingual models such as mBERT and XLM-R, and a modular adapter-based approach called MAD-X. To better understand model behavior, we group the target languages into three categories: seen (included during pre-training), partially seen (not included but linguistically related to seen languages), and unseen (absent and unrelated in pre-training data). Our results reveal clear performance disparities across these groups: multilingual models perform best on seen languages, moderately on partially seen ones, and poorly on unseen languages. We find that MAD-X significantly improves performance, especially for seen and partially seen languages, without requiring labeled data in the target language. Additionally, we conduct a further analysis on tokenization and show that while subword fragmentation and vocabulary overlap with Indonesian correlate weakly with prediction quality, they do not fully explain the observed performance. Instead, the most consistent predictor of transfer success is the model's prior exposure to the language, either directly or through a related language.
Understanding covert narratives and implicit messaging is essential for analyzing bias and sentiment. Traditional NLP methods struggle with detecting subtle phrasing and hidden agendas. This study tackles two key challenges: (1) multi-label classification of narratives and sub-narratives in news articles, and (2) generating concise, evidence-based explanations for dominant narratives. We fine-tune a BERT model with a recall-oriented approach for comprehensive narrative detection, refining predictions using a GPT-4o pipeline for consistency. For narrative explanation, we propose a ReACT (Reasoning + Acting) framework with semantic retrieval-based few-shot prompting, ensuring grounded and relevant justifications. To enhance factual accuracy and reduce hallucinations, we incorporate a structured taxonomy table as an auxiliary knowledge base. Our results show that integrating auxiliary knowledge in prompts improves classification accuracy and justification reliability, with applications in media analysis, education, and intelligence gathering.
Multimodal sarcasm understanding is a high-order cognitive task. Although large language models (LLMs) have shown impressive performance on many downstream NLP tasks, growing evidence suggests that they struggle with sarcasm understanding. In this paper, we propose Commander-GPT, a modular decision routing framework inspired by military command theory. Rather than relying on a single LLM's capability, Commander-GPT orchestrates a team of specialized LLM agents where each agent will be selectively assigned to a focused sub-task such as context modeling, sentiment analysis, etc. Their outputs are then routed back to the commander, which integrates the information and performs the final sarcasm judgment. To coordinate these agents, we introduce three types of centralized commanders: (1) a trained lightweight encoder-based commander (e.g., multi-modal BERT); (2) four small autoregressive language models, serving as moderately capable commanders (e.g., DeepSeek-VL); (3) two large LLM-based commander (Gemini Pro and GPT-4o) that performs task routing, output aggregation, and sarcasm decision-making in a zero-shot fashion. We evaluate Commander-GPT on the MMSD and MMSD 2.0 benchmarks, comparing five prompting strategies. Experimental results show that our framework achieves 4.4% and 11.7% improvement in F1 score over state-of-the-art (SoTA) baselines on average, demonstrating its effectiveness.
Recurrent neural networks (RNNs), particularly LSTMs, are effective for time-series tasks like sentiment analysis and short-term stock prediction. However, their computational complexity poses challenges for real-time deployment in resource constrained environments. While FPGAs offer a promising platform for energy-efficient AI acceleration, existing tools mainly target feed-forward networks, and LSTM acceleration typically requires full custom implementation. In this paper, we address this gap by leveraging the open-source and extensible FINN framework to enable the generalized deployment of LSTMs on FPGAs. Specifically, we leverage the Scan operator from the Open Neural Network Exchange (ONNX) specification to model the recurrent nature of LSTM computations, enabling support for mixed quantisation within them and functional verification of LSTM-based models. Furthermore, we introduce custom transformations within the FINN compiler to map the quantised ONNX computation graph to hardware blocks from the HLS kernel library of the FINN compiler and Vitis HLS. We validate the proposed tool-flow by training a quantised ConvLSTM model for a mid-price stock prediction task using the widely used dataset and generating a corresponding hardware IP of the model using our flow, targeting the XCZU7EV device. We show that the generated quantised ConvLSTM accelerator through our flow achieves a balance between performance (latency) and resource consumption, while matching (or bettering) inference accuracy of state-of-the-art models with reduced precision. We believe that the generalisable nature of the proposed flow will pave the way for resource-efficient RNN accelerator designs on FPGAs.
This work proposes that a vast majority of classical technical indicators in financial analysis are, in essence, special cases of neural networks with fixed and interpretable weights. It is shown that nearly all such indicators, such as moving averages, momentum-based oscillators, volatility bands, and other commonly used technical constructs, can be reconstructed topologically as modular neural network components. Technical Indicator Networks (TINs) are introduced as a general neural architecture that replicates and structurally upgrades traditional indicators by supporting n-dimensional inputs such as price, volume, sentiment, and order book data. By encoding domain-specific knowledge into neural structures, TINs modernize the foundational logic of technical analysis and propel algorithmic trading into a new era, bridging the legacy of proven indicators with the potential of contemporary AI systems.
Forecasting stock and cryptocurrency prices is challenging due to high volatility and non-stationarity, influenced by factors like economic changes and market sentiment. Previous research shows that Echo State Networks (ESNs) can effectively model short-term stock market movements, capturing nonlinear patterns in dynamic data. To the best of our knowledge, this work is among the first to explore ESNs for cryptocurrency forecasting, especially during extreme volatility. We also conduct chaos analysis through the Lyapunov exponent in chaotic periods and show that our approach outperforms existing machine learning methods by a significant margin. Our findings are consistent with the Lyapunov exponent analysis, showing that ESNs are robust during chaotic periods and excel under high chaos compared to Boosting and Na\"ive methods.
Large Language Models (LLMs) have started to demonstrate the ability to persuade humans, yet our understanding of how this dynamic transpires is limited. Recent work has used linear probes, lightweight tools for analyzing model representations, to study various LLM skills such as the ability to model user sentiment and political perspective. Motivated by this, we apply probes to study persuasion dynamics in natural, multi-turn conversations. We leverage insights from cognitive science to train probes on distinct aspects of persuasion: persuasion success, persuadee personality, and persuasion strategy. Despite their simplicity, we show that they capture various aspects of persuasion at both the sample and dataset levels. For instance, probes can identify the point in a conversation where the persuadee was persuaded or where persuasive success generally occurs across the entire dataset. We also show that in addition to being faster than expensive prompting-based approaches, probes can do just as well and even outperform prompting in some settings, such as when uncovering persuasion strategy. This suggests probes as a plausible avenue for studying other complex behaviours such as deception and manipulation, especially in multi-turn settings and large-scale dataset analysis where prompting-based methods would be computationally inefficient.
Public product launches in Artificial Intelligence can serve as focusing events for collective attention, surfacing how societies react to technological change. Social media provide a window into the sensemaking around these events, surfacing hopes and fears and showing who chooses to engage in the discourse and when. We demonstrate that public sensemaking about AI is shaped by economic interests and cultural values of those involved. We analyze 3.8 million tweets posted by 1.6 million users across 117 countries in response to the public launch of ChatGPT in 2022. Our analysis shows how economic self-interest, proxied by occupational skill types in writing, programming, and mathematics, and national cultural orientations, as measured by Hofstede's individualism, uncertainty avoidance, and power distance dimensions, shape who speaks, when they speak, and their stance towards ChatGPT. Roles requiring more technical skills, such as programming and mathematics, tend to engage earlier and express more positive stances, whereas writing-centric occupations join later with greater skepticism. At the cultural level, individualism predicts both earlier engagement and a more negative stance, and uncertainty avoidance reduces the prevalence of positive stances but does not delay when users first engage with ChatGPT. Aggregate sentiment trends mask the dynamics observed in our study. The shift toward a more critical stance towards ChatGPT over time stems primarily from the entry of more skeptical voices rather than a change of heart among early adopters. Our findings underscore the importance of both the occupational background and cultural context in understanding public reactions to AI.