Time series analysis comprises statistical methods for analyzing a sequence of data points collected over an interval of time to identify interesting patterns and trends.
We propose a novel computational framework for analyzing electroencephalography (EEG) time series using methods from stringology, the study of efficient algorithms for string processing, to systematically identify and characterize recurrent temporal patterns in neural signals. The primary aim is to introduce quantitative measures to understand neural signal dynamics, with the present findings serving as a proof-of-concept. The framework adapts order-preserving matching (OPM) and Cartesian tree matching (CTM) to detect temporal motifs that preserve relative ordering and hierarchical structure while remaining invariant to amplitude scaling. This approach provides a temporally precise representation of EEG dynamics that complements traditional spectral and global complexity analyses. To evaluate its utility, we applied the framework to multichannel EEG recordings from individuals with attention-deficit/hyperactivity disorder (ADHD) and matched controls using a publicly available dataset. Highly recurrent, group-specific motifs were extracted and quantified using both OPM and CTM. The ADHD group exhibited significantly higher motif frequencies, suggesting increased repetitiveness in neural activity. OPM analysis revealed shorter motif lengths and greater gradient instability in ADHD, reflected in larger mean and maximal inter-sample amplitude changes. CTM analysis further demonstrated reduced hierarchical complexity in ADHD, characterized by shallower tree structures and fewer hierarchical levels despite comparable motif lengths. These findings suggest that ADHD-related EEG alterations involve systematic differences in the structure, stability, and hierarchical organization of recurrent temporal patterns. The proposed stringology-based motif framework provides a complementary computational tool with potential applications for objective biomarker development in neurodevelopmental disorders.
This study investigates a data-driven machine learning approach to predict membrane fouling in critically ill patients undergoing Continuous Renal Replacement Therapy (CRRT). Using time-series data from an ICU, 16 clinically selected features were identified to train predictive models. To ensure interpretability and enable reliable counterfactual analysis, the researchers adopted a tabular data approach rather than modeling temporal dependencies directly. Given the imbalance between fouling and non-fouling cases, the ADASYN oversampling technique was applied to improve minority class representation. Random Forest, XGBoost, and LightGBM models were tested, achieving balanced performance with 77.6% sensitivity and 96.3% specificity at a 10% rebalancing rate. Results remained robust across different forecasting horizons. Notably, the tabular approach outperformed LSTM recurrent neural networks, suggesting that explicit temporal modeling was not necessary for strong predictive performance. Feature selection further reduced the model to five key variables, improving simplicity and interpretability with minimal loss of accuracy. A Shapley value-based counterfactual analysis was applied to the best-performing model, successfully identifying minimal input changes capable of reversing fouling predictions. Overall, the findings support the viability of interpretable machine learning models for predicting membrane fouling during CRRT. The integration of prediction and counterfactual analysis offers practical clinical value, potentially guiding therapeutic adjustments to reduce fouling risk and improve patient management.
This study presents a Normal Behavior Model (NBM) developed to forecast monitoring time-series data from the ASTRI-Horn Cherenkov telescope under normal operating conditions. The analysis focused on 15 physical variables acquired by the Telescope Control Unit between September 2022 and July 2024, representing sensor measurements from the Azimuth and Elevation motors. After data cleaning, resampling, feature selection, and correlation analysis, the dataset was segmented into fixed-length intervals, in which the first I samples represented the input sequence provided to the model, while the forecast length, T, indicated the number of future time steps to be predicted. A sliding-window technique was then applied to increase the number of intervals. A Multi-Layer Perceptron (MLP) was trained to perform multivariate forecasting across all features simultaneously. Model performance was evaluated using the Mean Squared Error (MSE) and the Normalized Median Absolute Deviation (NMAD), and it was also benchmarked against a Long Short-Term Memory (LSTM) network. The MLP model demonstrated consistent results across different features and I-T configurations, and matched the performance of the LSTM while converging faster. It achieved an MSE of 0.019+/-0.003 and an NMAD of 0.032+/-0.009 on the test set under its best configuration (4 hidden layers, 720 units per layer, and I-T lengths of 300 samples each, corresponding to 5 hours at 1-minute resolution). Extending the forecast horizon up to 6.5 hours-the maximum allowed by this configuration-did not degrade performance, confirming the model's effectiveness in providing reliable hour-scale predictions. The proposed NBM provides a powerful tool for enabling early anomaly detection in online ASTRI-Horn monitoring time series, offering a basis for the future development of a prognostics and health management system that supports predictive maintenance.
Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from a theoretical perspective. In this work, we present the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series representation learning. We propose a novel expressivity framework tailored to time series, which quantifies a model's ability to distinguish between similar and dissimilar inputs in the representation space. Using this framework, we derive theoretical bounds for two widely used normalization methods: Standard and Min-Max scaling. Our analysis reveals that the choice of normalization strategy can significantly influence the model's representational capacity, depending on the task and data characteristics. We complement our theory with empirical validation on classification and forecasting benchmarks using multiple Transformer-based models. Our results show that no single normalization method consistently outperforms others, and in some cases, omitting normalization entirely leads to superior performance. These findings highlight the critical role of preprocessing in time series learning and motivate the need for more principled normalization strategies tailored to specific tasks and datasets.
Affine frequency division multiplexing (AFDM) has recently emerged as a resilient waveform candidate for high-mobility next-generation wireless systems. However, current literature mostly focuses on discrete time (DT) models, often overlooking effects and hardware non-idealities of actual continuous time (CT) signal generation. In this paper, we bridge this gap by developing a CT-analytical framework based on the affine Fourier series (AFS) representation, which allows us to demonstrate that strictly bandlimited pulses and subcarrier suppression strategies are essential to maintain the multicarrier structure of the signal. In addition, we derive the analytical power spectral density of AFDM and evaluate its spectral characteristics in comparison with other multicarrier schemes, considering the impact of realistic truncated pulse-shaping. Furthermore, we analyze the sensitivity of the CT model to phase noise, carrier frequency offset, and sampling jitter, providing a theoretical analysis of communication performance. Finally, we derive closed-form Cramér-Rao bounds for channel parameter estimation, showing that the chirped modulation peculiar of AFDM increases the estimation variance but enables the resolution of Doppler ambiguities. Our findings provide the necessary theoretical and practical foundations for the implementation of AFDM in realistic wireless transceivers.
In many applications, weighted networks are constructed based on time series data: each time series is associated to a vertex and edge weights are given by pairwise correlations. The result is a network whose edge dependency structure violates the assumptions of most common network models. Nonetheless, it is common to analyze these "correlation networks" using embedding methods derived from edge-independent network models, based on a belief that the edges are approximately independent. In this work, we put this modeling choice on firm theoretical ground. We show that when the time series are expressible in terms of a small number of Fourier basis elements (or in some other suitably-chosen basis), correlation networks correspond to latent space networks with dependent edge noise in which the vertex-level latent variables encode the basis coefficients. Further, we show that when time series are observed subject to noise, spectral embedding of the resulting noisy correlation network still recovers these true vertex-level latent representations under suitable assumptions. This characterization of embeddings as learning Fourier coefficients appears to be folklore in the signal processing community in the context of principal component analysis, but is, to the best of our knowledge, new to the statistical network analysis literature.
Deep learning has significantly improved time series classification, yet the lack of explainability in these models remains a major challenge. While Explainable AI (XAI) techniques aim to make model decisions more transparent, their effectiveness is often hindered by the high dimensionality and noise present in raw time series data. In this work, we investigate whether transforming time series into discrete latent representations-using methods such as Vector Quantized Variational Autoencoders (VQ-VAE) and Discrete Variational Autoencoders (DVAE)-not only preserves but enhances explainability by reducing redundancy and focusing on the most informative patterns. We show that applying XAI methods to these compressed representations leads to concise and structured explanations that maintain faithfulness without sacrificing classification performance. Additionally, we propose Similar Subsequence Accuracy (SSA), a novel metric that quantitatively assesses the alignment between XAI-identified salient subsequences and the label distribution in the training data. SSA provides a systematic way to validate whether the features highlighted by XAI methods are truly representative of the learned classification patterns. Our findings demonstrate that discrete latent representations not only retain the essential characteristics needed for classification but also offer a pathway to more compact, interpretable, and computationally efficient explanations in time series analysis.
Learning-based signal processing systems increasingly support high-stakes medical decisions using heterogeneous biomedical signals, including medical images, physiological time series, and clinical records. Despite strong predictive performance, many models rely on statistical correlations that are unstable across acquisition settings, patient populations, and institutional practices, limiting robustness, interpretability, and clinical trust. We advocate a causal signal processing perspective in which biomedical signals are treated as effects of latent generative mechanisms rather than as isolated predictive inputs. Using clinical risk prediction as a motivating example, we show how disease-related factors generate observable biomarkers, while acquisition processes act as confounders influencing signal appearance. In clinical disease risk prediction from chest CT scans and patient risk factors, correlational models may fail under scanner changes, whereas causal abstractions remain invariant. Building on this view, we propose a unifying conceptual framework integrating causal modeling with learning-based signal processing and neuro-symbolic reasoning. Statistical models extract multimodal representations that are mapped to interpretable causal abstractions and combined with symbolic knowledge encoding clinical risk factors and guidelines. This structure enables clinically grounded explanations, counterfactual reasoning about hypothetical interventions, and improved robustness to distribution shifts arising from changes in acquisition conditions or screening policies. Rather than introducing a specific algorithm, this article presents schematic causal structures and a comparative analysis of correlation-based, causal, and neuro-symbolic approaches to guide the design of robust and interpretable medical decision-support systems.
Root cause analysis (RCA) in networked industrial systems, such as supply chains and power networks, is notoriously difficult due to unknown and dynamically evolving interdependencies among geographically distributed clients. These clients represent heterogeneous physical processes and industrial assets equipped with sensors that generate large volumes of nonlinear, high-dimensional, and heterogeneous IoT data. Classical RCA methods require partial or full knowledge of the system's dependency graph, which is rarely available in these complex networks. While federated learning (FL) offers a natural framework for decentralized settings, most existing FL methods assume homogeneous feature spaces and retrainable client models. These assumptions are not compatible with our problem setting. Different clients have different data features and often run fixed, proprietary models that cannot be modified. This paper presents a federated cross-client interdependency learning methodology for feature-partitioned, nonlinear time-series data, without requiring access to raw sensor streams or modifying proprietary client models. Each proprietary local client model is augmented with a Machine Learning (ML) model that encodes cross-client interdependencies. These ML models are coordinated via a global server that enforces representation consistency while preserving privacy through calibrated differential privacy noise. RCA is performed using model residuals and anomaly flags. We establish theoretical convergence guarantees and validate our approach on extensive simulations and a real-world industrial cybersecurity dataset.
Cognitive load, the mental effort required during working memory, is central to neuroscience, psychology, and human-computer interaction. Accurate assessment is vital for adaptive learning, clinical monitoring, and brain-computer interfaces. Physiological signals such as pupillometry and electroencephalography are established biomarkers of cognitive load, but their comparative utility and practical integration as lightweight, wearable monitoring solutions remain underexplored. EEG provides high temporal resolution of neural activity. Although non-invasive, it is technologically demanding and limited in wearability and cost due to its resource-intensive nature, whereas pupillometry is non-invasive, portable, and scalable. Existing studies often rely on deep learning models with limited interpretability and substantial computational expense. This study integrates feature-based and model-driven approaches to advance time-series analysis. Using the OpenNeuro 'Digit Span Task' dataset, this study investigates cognitive load classification from EEG and pupillometry. Feature-based approaches using Catch-22 features and classical machine learning models outperform deep learning in both binary and multiclass tasks. The findings demonstrate that pupillometry alone can compete with EEG, serving as a portable and practical proxy for real-world applications. These results challenge the assumption that EEG is necessary for load detection, showing that pupil dynamics combined with interpretable models and SHAP based feature analysis provide physiologically meaningful insights. This work supports the development of wearable, affordable cognitive monitoring systems for neuropsychiatry, education, and healthcare.