Recent advances in finance-specific language models such as FinBERT have enabled the quantification of public sentiment into index-based measures, yet compressing diverse linguistic signals into single metrics overlooks contextual nuances and limits interpretability. To address this limitation, explainable AI techniques, particularly SHAP (SHapley Additive Explanations), have been employed to identify influential features. However, SHAP's computational cost grows exponentially with input features, making it impractical for large-scale text-based financial data. This study introduces a GRU-based forecasting framework enhanced with GroupSHAP, which quantifies contributions of semantically related keyword groups rather than individual tokens, substantially reducing computational burden while preserving interpretability. We employed FinBERT to embed news articles from 2015 to 2024, clustered them into coherent semantic groups, and applied GroupSHAP to measure each group's contribution to stock price movements. The resulting group-level SHAP variables across multiple topics were used as input features for the prediction model. Empirical results from one-day-ahead forecasting of the S&P 500 index throughout 2024 demonstrate that our approach achieves a 32.2% reduction in MAE and a 40.5% reduction in RMSE compared with benchmark models without the GroupSHAP mechanism. This research presents the first application of GroupSHAP in news-driven financial forecasting, showing that grouped sentiment representations simultaneously enhance interpretability and predictive performance.
Cantonese, although spoken by millions, remains under-resourced due to policy and diglossia. To address this scarcity of evaluation frameworks for Cantonese, we introduce \textsc{\textbf{CantoNLU}}, a benchmark for Cantonese natural language understanding (NLU). This novel benchmark spans seven tasks covering syntax and semantics, including word sense disambiguation, linguistic acceptability judgment, language detection, natural language inference, sentiment analysis, part-of-speech tagging, and dependency parsing. In addition to the benchmark, we provide model baseline performance across a set of models: a Mandarin model without Cantonese training, two Cantonese-adapted models obtained by continual pre-training a Mandarin model on Cantonese text, and a monolingual Cantonese model trained from scratch. Results show that Cantonese-adapted models perform best overall, while monolingual models perform better on syntactic tasks. Mandarin models remain competitive in certain settings, indicating that direct transfer may be sufficient when Cantonese domain data is scarce. We release all datasets, code, and model weights to facilitate future research in Cantonese NLP.
The financial market is known to be highly sensitive to news. Therefore, effectively incorporating news data into quantitative trading remains an important challenge. Existing approaches typically rely on manually designed rules and/or handcrafted features. In this work, we directly use the news sentiment scores derived from large language models, together with raw price and volume data, as observable inputs for reinforcement learning. These inputs are processed by sequence models such as recurrent neural networks or Transformers to make end-to-end trading decisions. We conduct experiments using the cryptocurrency market as an example and evaluate two representative reinforcement learning algorithms, namely Double Deep Q-Network (DDQN) and Group Relative Policy Optimization (GRPO). The results demonstrate that our news-aware approach, which does not depend on handcrafted features or manually designed rules, can achieve performance superior to market benchmarks. We further highlight the critical role of time-series information in this process.
Simulations constitute a fundamental component of medical and nursing education and traditionally employ standardized patients (SP) and high-fidelity manikins to develop clinical reasoning and communication skills. However, these methods require substantial resources, limiting accessibility and scalability. In this study, we introduce CLiVR, a Conversational Learning system in Virtual Reality that integrates large language models (LLMs), speech processing, and 3D avatars to simulate realistic doctor-patient interactions. Developed in Unity and deployed on the Meta Quest 3 platform, CLiVR enables trainees to engage in natural dialogue with virtual patients. Each simulation is dynamically generated from a syndrome-symptom database and enhanced with sentiment analysis to provide feedback on communication tone. Through an expert user study involving medical school faculty (n=13), we assessed usability, realism, and perceived educational impact. Results demonstrated strong user acceptance, high confidence in educational potential, and valuable feedback for improvement. CLiVR offers a scalable, immersive supplement to SP-based training.
We introduce a novel high-frequency daily panel dataset of both markets and news-based indicators -- including Geopolitical Risk, Economic Policy Uncertainty, Trade Policy Uncertainty, and Political Sentiment -- for 42 countries across both emerging and developed markets. Using this dataset, we study how sentiment dynamics shape sovereign risk, measured by Credit Default Swap (CDS) spreads, and evaluate their forecasting value relative to traditional drivers such as global monetary policy and market volatility. Our horse-race analysis of forecasting models demonstrates that incorporating news-based indicators significantly enhances predictive accuracy and enriches the analysis, with non-linear machine learning methods -- particularly Random Forests -- delivering the largest gains. Our analysis reveals that while global financial variables remain the dominant drivers of sovereign risk, geopolitical risk and economic policy uncertainty also play a meaningful role. Crucially, their effects are amplified through non-linear interactions with global financial conditions. Finally, we document pronounced regional heterogeneity, as certain asset classes and emerging markets exhibit heightened sensitivity to shocks in policy rates, global financial volatility, and geopolitical risk.




This paper presents a Multi Agent Bitcoin Trading system that utilizes Large Lan- guage Models (LLMs) for alpha generation and portfolio management in the cryptocur- rencies market. Unlike equities, cryptocurrencies exhibit extreme volatility and are heavily influenced by rapidly shifting market sentiments and regulatory announcements, making them difficult to model using static regression models or neural networks trained solely on historical data [53]. The proposed framework overcomes this by structuring LLMs into specialised agents for technical analysis, sentiment evaluation, decision-making, and performance reflection. The system improves over time through a novel verbal feedback mechanism where a Reflect agent provides daily and weekly natural-language critiques of trading decisions. These textual evaluations are then injected into future prompts, al- lowing the system to adjust indicator priorities, sentiment weights, and allocation logic without parameter updates or finetuning. Back-testing on Bitcoin price data from July 2024 to April 2025 shows consistent outperformance across market regimes: the Quantita- tive agent delivered over 30% higher returns in bullish phases and 15% overall gains versus buy-and-hold, while the sentiment-driven agent turned sideways markets from a small loss into a gain of over 100%. Adding weekly feedback further improved total performance by 31% and reduced bearish losses by 10%. The results demonstrate that verbal feedback represents a new, scalable, and low-cost method of tuning LLMs for financial goals.




To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU capabilities, including sentiment analysis, paraphrase detection, grammatical judgment, and reasoning, with a particular focus on linguistic phenomena relevant to the French language. We benchmark 94 large language models (LLM), providing an extensive analysis of the current state of French NLU. Our results highlight a significant performance gap between closed- and open-weights models and identify key challenging frontiers for current LLMs, such as zero-shot extractive question-answering (QA), fine-grained word sense disambiguation, and understanding of regional language variations. We release COLE as a public resource to foster further progress in French language modelling.
Sentiment analysis is a key task in Natural Language Processing (NLP), enabling the extraction of meaningful insights from user opinions across various domains. However, performing sentiment analysis in Persian remains challenging due to the scarcity of labeled datasets, limited preprocessing tools, and the lack of high-quality embeddings and feature extraction methods. To address these limitations, we propose a hybrid approach that integrates machine learning (ML) and deep learning (DL) techniques for Persian aspect-based sentiment analysis (ABSA). In particular, we utilize polarity scores from multilingual BERT as additional features and incorporate them into a decision tree classifier, achieving an accuracy of 93.34%-surpassing existing benchmarks on the Pars-ABSA dataset. Additionally, we introduce a Persian synonym and entity dictionary, a novel linguistic resource that supports text augmentation through synonym and named entity replacement. Our results demonstrate the effectiveness of hybrid modeling and feature augmentation in advancing sentiment analysis for low-resource languages such as Persian.
We propose the Causal Sphere Hypergraph Transformer (CSHT), a novel architecture for interpretable financial time-series forecasting that unifies \emph{Granger-causal hypergraph structure}, \emph{Riemannian geometry}, and \emph{causally masked Transformer attention}. CSHT models the directional influence of financial news and sentiment on asset returns by extracting multivariate Granger-causal dependencies, which are encoded as directional hyperedges on the surface of a hypersphere. Attention is constrained via angular masks that preserve both temporal directionality and geometric consistency. Evaluated on S\&P 500 data from 2018 to 2023, including the 2020 COVID-19 shock, CSHT consistently outperforms baselines across return prediction, regime classification, and top-asset ranking tasks. By enforcing predictive causal structure and embedding variables in a Riemannian manifold, CSHT delivers both \emph{robust generalisation across market regimes} and \emph{transparent attribution pathways} from macroeconomic events to stock-level responses. These results suggest that CSHT is a principled and practical solution for trustworthy financial forecasting under uncertainty.




Multimodal sentiment analysis (MSA) leverages information fusion from diverse modalities (e.g., text, audio, visual) to enhance sentiment prediction. However, simple fusion techniques often fail to account for variations in modality quality, such as those that are noisy, missing, or semantically conflicting. This oversight leads to suboptimal performance, especially in discerning subtle emotional nuances. To mitigate this limitation, we introduce a simple yet efficient \textbf{A}daptive \textbf{G}ated \textbf{F}usion \textbf{N}etwork that adaptively adjusts feature weights via a dual gate fusion mechanism based on information entropy and modality importance. This mechanism mitigates the influence of noisy modalities and prioritizes informative cues following unimodal encoding and cross-modal interaction. Experiments on CMU-MOSI and CMU-MOSEI show that AGFN significantly outperforms strong baselines in accuracy, effectively discerning subtle emotions with robust performance. Visualization analysis of feature representations demonstrates that AGFN enhances generalization by learning from a broader feature distribution, achieved by reducing the correlation between feature location and prediction error, thereby decreasing reliance on specific locations and creating more robust multimodal feature representations.