This study investigates whether Topological Data Analysis (TDA) can provide additional insights beyond traditional statistical methods in clustering currency behaviours. We focus on the foreign exchange (FX) market, which is a complex system often exhibiting non-linear and high-dimensional dynamics that classical techniques may not fully capture. We compare clustering results based on TDA-derived features versus classical statistical features using monthly logarithmic returns of 13 major currency exchange rates (all against the euro). Two widely-used clustering algorithms, \(k\)-means and Hierarchical clustering, are applied on both types of features, and cluster quality is evaluated via the Silhouette score and the Calinski-Harabasz index. Our findings show that TDA-based feature clustering produces more compact and well-separated clusters than clustering on traditional statistical features, particularly achieving substantially higher Calinski-Harabasz scores. However, all clustering approaches yield modest Silhouette scores, underscoring the inherent difficulty of grouping FX time series. The differing cluster compositions under TDA vs. classical features suggest that TDA captures structural patterns in currency co-movements that conventional methods might overlook. These results highlight TDA as a valuable complementary tool for analysing financial time series, with potential applications in risk management where understanding structural co-movements is crucial.
The rapid ascent of artificial intelligence (AI) is often portrayed as a revolution born from computer science and engineering. This narrative, however, obscures a fundamental truth: the theoretical and methodological core of AI is, and has always been, statistical. This paper systematically argues that the field of statistics provides the indispensable foundation for machine learning and modern AI. We deconstruct AI into nine foundational pillars-Inference, Density Estimation, Sequential Learning, Generalization, Representation Learning, Interpretability, Causality, Optimization, and Unification-demonstrating that each is built upon century-old statistical principles. From the inferential frameworks of hypothesis testing and estimation that underpin model evaluation, to the density estimation roots of clustering and generative AI; from the time-series analysis inspiring recurrent networks to the causal models that promise true understanding, we trace an unbroken statistical lineage. While celebrating the computational engines that power modern AI, we contend that statistics provides the brain-the theoretical frameworks, uncertainty quantification, and inferential goals-while computer science provides the brawn-the scalable algorithms and hardware. Recognizing this statistical backbone is not merely an academic exercise, but a necessary step for developing more robust, interpretable, and trustworthy intelligent systems. We issue a call to action for education, research, and practice to re-embrace this statistical foundation. Ignoring these roots risks building a fragile future; embracing them is the path to truly intelligent machines. There is no machine learning without statistical learning; no artificial intelligence without statistical thought.
We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the parameters of a Switching Linear Dynamical System and a Gaussian process prior to model the statistical variations in amplitude and temporal alignment within each cluster. By modeling the evolution of time series patterns, the method avoids unnecessary proliferation of clusters in a principled manner. We perform inference by formulating a variational lower bound for off-line and on-line scenarios, enabling efficient learning through optimization. We illustrate the versatility and effectiveness of the approach through several case studies of electrocardiogram analysis using publicly available databases.
Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.
We present a novel framework that leverages time series clustering to improve internet traffic matrix (TM) prediction using deep learning (DL) models. Traffic flows within a TM often exhibit diverse temporal behaviors, which can hinder prediction accuracy when training a single model across all flows. To address this, we propose two clustering strategies, source clustering and histogram clustering, that group flows with similar temporal patterns prior to model training. Clustering creates more homogeneous data subsets, enabling models to capture underlying patterns more effectively and generalize better than global prediction approaches that fit a single model to the entire TM. Compared to existing TM prediction methods, our method reduces RMSE by up to 92\% for Abilene and 75\% for G\'EANT. In routing scenarios, our clustered predictions also reduce maximum link utilization (MLU) bias by 18\% and 21\%, respectively, demonstrating the practical benefits of clustering when TMs are used for network optimization.
Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a two-stage framework that integrates unsupervised pattern extracion with interpretable forecasting. (i) SIMPC segments and clusters multivariate time series, extracting recurrent patterns that are invariant to amplitude scaling and temporal distortion, even under varying window sizes. (ii) JISC-Net is a shapelet-based classifier that uses the initial part of extracted patterns as input and forecasts subsequent partial sequences for short-term directional movement. Experiments on Bitcoin and three S&P 500 equities demonstrate that our method ranks first or second in 11 out of 12 metric--dataset combinations, consistently outperforming baselines. Unlike conventional deep learning models that output buy-or-sell signals without interpretable justification, our approach enables transparent decision-making by revealing the underlying pattern structures that drive predictive outcomes.
To effectively address the issues of low sensitivity and high time consumption in time series anomaly detection, we propose an anomaly detection method based on cross-modal deep metric learning. A cross-modal deep metric learning feature clustering model is constructed, composed of an input layer, a triplet selection layer, and a loss function computation layer. The squared Euclidean distances between cluster centers are calculated, and a stochastic gradient descent strategy is employed to optimize the model and classify different time series features. The inner product of principal component direction vectors is used as a metric for anomaly measurement. The von Mises-Fisher (vMF) distribution is applied to describe the directional characteristics of time series data, and historical data is used to train and obtain evaluation parameters. By comparing the principal component direction vector of actual time series data with the threshold, anomaly detection is performed. Experimental results demonstrate that the proposed method accurately classifies time series data with different attributes, exhibits high sensitivity to anomalies, and achieves high detection accuracy, fast detection speed, and strong robustness.
Stochastic forecasting is critical for efficient decision-making in uncertain systems, such as energy markets and finance, where estimating the full distribution of future scenarios is essential. We propose Diffusion Scenario Tree (DST), a general framework for constructing scenario trees for multivariate prediction tasks using diffusion-based probabilistic forecasting models. DST recursively samples future trajectories and organizes them into a tree via clustering, ensuring non-anticipativity (decisions depending only on observed history) at each stage. We evaluate the framework on the optimization task of energy arbitrage in New York State's day-ahead electricity market. Experimental results show that our approach consistently outperforms the same optimization algorithms that use scenario trees from more conventional models and Model-Free Reinforcement Learning baselines. Furthermore, using DST for stochastic optimization yields more efficient decision policies, achieving higher performance by better handling uncertainty than deterministic and stochastic MPC variants using the same diffusion-based forecaster.
TimeCluster is a visual analytics technique for discovering structure in long multivariate time series by projecting overlapping windows of data into a low-dimensional space. We show that, when Principal Component Analysis (PCA) is chosen as the dimensionality reduction technique, this procedure is mathematically equivalent to classical linear subspace identification (block-Hankel matrix plus Singular Vector Decomposition (SVD)). In both approaches, the same low-dimensional linear subspace is extracted from the time series data. We first review the TimeCluster method and the theory of subspace system identification. Then we show that forming the sliding-window matrix of a time series yields a Hankel matrix, so applying PCA (via SVD) to this matrix recovers the same principal directions as subspace identification. Thus the cluster coordinates from TimeCluster coincide with the subspace identification methods. We present experiments on synthetic and real dynamical signals confirming that the two embeddings coincide. Finally, we explore and discuss future opportunities enabled by this equivalence, including forecasting from the identified state space, streaming/online extensions, incorporating and visualising external inputs and robust techniques for displaying underlying trends in corrupted data.
Federated Learning (FL) has become a powerful technique for training Machine Learning (ML) models in a decentralized manner, preserving the privacy of the training datasets involved. However, the decentralized nature of FL limits the visibility of the training process, relying heavily on the honesty of participating clients. This assumption opens the door to malicious third parties, known as Byzantine clients, which can poison the training process by submitting false model updates. Such malicious clients may engage in poisoning attacks, manipulating either the dataset or the model parameters to induce misclassification. In response, this study introduces FLAegis, a two-stage defensive framework designed to identify Byzantine clients and improve the robustness of FL systems. Our approach leverages symbolic time series transformation (SAX) to amplify the differences between benign and malicious models, and spectral clustering, which enables accurate detection of adversarial behavior. Furthermore, we incorporate a robust FFT-based aggregation function as a final layer to mitigate the impact of those Byzantine clients that manage to evade prior defenses. We rigorously evaluate our method against five poisoning attacks, ranging from simple label flipping to adaptive optimization-based strategies. Notably, our approach outperforms state-of-the-art defenses in both detection precision and final model accuracy, maintaining consistently high performance even under strong adversarial conditions.