Abstract:Financial markets are critical to global economic stability, yet trade-based manipulation (TBM) often undermines their fairness. Spoofing, a particularly deceptive TBM strategy, exhibits multilevel anomaly patterns that have not been adequately modeled. These patterns are usually concealed within the rich, hierarchical information of the Limit Order Book (LOB), which is challenging to leverage due to high dimensionality and noise. To address this, we propose a representation learning framework combining a cascaded LOB representation pipeline with supervised contrastive learning. Extensive experiments demonstrate that our framework consistently improves detection performance across diverse models, with Transformer-based architectures achieving state-of-the-art results. In addition, we conduct systematic analyses and ablation studies to investigate multilevel anomalies and the contributions of key components, offering broader insights into representation learning and anomaly detection for complex sequential data. Our code will be released later at this URL.
Abstract:The Limit Order Book (LOB), the mostly fundamental data of the financial market, provides a fine-grained view of market dynamics while poses significant challenges in dealing with the esteemed deep models due to its strong autocorrelation, cross-feature constrains, and feature scale disparity. Existing approaches often tightly couple representation learning with specific downstream tasks in an end-to-end manner, failed to analyze the learned representations individually and explicitly, limiting their reusability and generalization. This paper conducts the first systematic comparative study of LOB representation learning, aiming to identify the effective way of extracting transferable, compact features that capture essential LOB properties. We introduce LOBench, a standardized benchmark with real China A-share market data, offering curated datasets, unified preprocessing, consistent evaluation metrics, and strong baselines. Extensive experiments validate the sufficiency and necessity of LOB representations for various downstream tasks and highlight their advantages over both the traditional task-specific end-to-end models and the advanced representation learning models for general time series. Our work establishes a reproducible framework and provides clear guidelines for future research. Datasets and code will be publicly available at https://github.com/financial-simulation-lab/LOBench.