Abstract:By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating sentiment data from a finetuned Qwen3 model (Sharpe ratio 1.04) significantly outperform baseline models using tabular data alone (Sharpe ratio 0.23). Subsequent analysis elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting.




Abstract:In presence of spatial heterogeneity, models applied to geographic data face a trade-off between producing general results and capturing local variations. Modelling at a regional scale may allow the identification of solutions that optimize both accuracy and generality. However, most current regionalization algorithms assume homogeneity in the attributes to delineate regions without considering the processes that generate the attributes. In this paper, we propose a generalized regionalization framework based on a two-item objective function which favors solutions with the highest overall accuracy while minimizing the number of regions. We introduce three regionalization algorithms, which extend previous methods that account for spatially constrained clustering. The effectiveness of the proposed framework is examined in regression experiments on both simulated and real data. The results show that a spatially implicit algorithm extended with an automatic post-processing procedure outperforms spatially explicit approaches. Our suggested framework contributes to better capturing the processes associated with spatial heterogeneity with potential applications in a wide range of geographical models.