Topic:Time Series Forecasting
What is Time Series Forecasting? Time series forecasting is the task of fitting a model to historical, time-stamped data in order to predict future values. Traditional approaches include moving average, exponential smoothing, and ARIMA, though models as various as RNNs, Transformers, or XGBoost can also be applied. The most popular benchmark is the ETTh1 dataset. Models are typically evaluated using the Mean Square Error (MSE) or Root Mean Square Error (RMSE).
Papers and Code
Mar 06, 2025
Abstract:We present TimeFound, an encoder-decoder transformer-based time series foundation model for out-of-the-box zero-shot forecasting. To handle time series data from various domains, TimeFound employs a multi-resolution patching strategy to capture complex temporal patterns at multiple scales. We pre-train our model with two sizes (200M and 710M parameters) on a large time-series corpus comprising both real-world and synthetic datasets. Over a collection of unseen datasets across diverse domains and forecasting horizons, our empirical evaluations suggest that TimeFound can achieve superior or competitive zero-shot forecasting performance, compared to state-of-the-art time series foundation models.
Via

Mar 21, 2025
Abstract:Federated learning (FL) enables retailers to share model parameters for demand forecasting while maintaining privacy. However, heterogeneous data across diverse regions, driven by factors such as varying consumer behavior, poses challenges to the effectiveness of federated learning. To tackle this challenge, we propose Privacy-Adaptive Clustered Federated Learning (PA-CFL) tailored for demand forecasting on heterogeneous retail data. By leveraging differential privacy and feature importance distribution, PA-CFL groups retailers into distinct ``bubbles'', each forming its own federated learning system to effectively isolate data heterogeneity. Within each bubble, Transformer models are designed to predict local sales for each client. Our experiments demonstrate that PA-CFL significantly surpasses FedAvg and outperforms local learning in demand forecasting performance across all participating clients. Compared to local learning, PA-CFL achieves a 5.4% improvement in R^2, a 69% reduction in RMSE, and a 45% decrease in MAE. Our approach enables effective FL through adaptive adjustments to diverse noise levels and the range of clients participating in each bubble. By grouping participants and proactively filtering out high-risk clients, PA-CFL mitigates potential threats to the FL system. The findings demonstrate PA-CFL's ability to enhance federated learning in time series prediction tasks with heterogeneous data, achieving a balance between forecasting accuracy and privacy preservation in retail applications. Additionally, PA-CFL's capability to detect and neutralize poisoned data from clients enhances the system's robustness and reliability.
Via

Mar 06, 2025
Abstract:In this paper we propose a novel problem called the ForeClassing problem where the loss of a classification decision is only observed at a future time point after the classification decision has to be made. To solve this problem, we propose an approximately Bayesian deep neural network architecture called ForeClassNet for time series forecasting and classification. This network architecture forces the network to consider possible future realizations of the time series, by forecasting future time points and their likelihood of occurring, before making its final classification decision. To facilitate this, we introduce two novel neural network layers, Welford mean-variance layers and Boltzmann convolutional layers. Welford mean-variance layers allow networks to iteratively update their estimates of the mean and variance for the forecasted time points for each inputted time series to the network through successive forward passes, which the model can then consider in combination with a learned representation of the observed realizations of the time series for its classification decision. Boltzmann convolutional layers are linear combinations of approximately Bayesian convolutional layers with different filter lengths, allowing the model to learn multitemporal resolution representations of the input time series, and which resolutions to focus on within a given Boltzmann convolutional layer through a Boltzmann distribution. Through several simulation scenarios and two real world applications we demonstrate ForeClassNet achieves superior performance compared with current state of the art methods including a near 30% improvement in test set accuracy in our financial example compared to the second best performing model.
* 40 pages, 7 figures, 11 tables
Via

Mar 19, 2025
Abstract:Medical time series (MedTS) classification is crucial for improved diagnosis in healthcare, and yet it is challenging due to the varying granularity of patterns, intricate inter-channel correlation, information redundancy, and label scarcity. While existing transformer-based models have shown promise in time series analysis, they mainly focus on forecasting and fail to fully exploit the distinctive characteristics of MedTS data. In this paper, we introduce Sparseformer, a transformer specifically designed for MedTS classification. We propose a sparse token-based dual-attention mechanism that enables global modeling and token compression, allowing dynamic focus on the most informative tokens while distilling redundant features. This mechanism is then applied to the multi-granularity, cross-channel encoding of medical signals, capturing intra- and inter-granularity correlations and inter-channel connections. The sparsification design allows our model to handle heterogeneous inputs of varying lengths and channels directly. Further, we introduce an adaptive label encoder to address label space misalignment across datasets, equipping our model with cross-dataset transferability to alleviate the medical label scarcity issue. Our model outperforms 12 baselines across seven medical datasets under supervised learning. In the few-shot learning experiments, our model also achieves superior average results. In addition, the in-domain and cross-domain experiments among three diagnostic scenarios demonstrate our model's zero-shot learning capability. Collectively, these findings underscore the robustness and transferability of our model in various medical applications.
* 3 figures, 16 pages, 5 tables
Via

Mar 11, 2025
Abstract:Time series data captures properties that change over time. Such data occurs widely, ranging from the scientific and medical domains to the industrial and environmental domains. When the properties in time series exhibit spatial variations, we often call the data spatio-temporal. As part of the continued digitalization of processes throughout society, increasingly large volumes of time series and spatio-temporal data are available. In this tutorial, we focus on data-driven decision making with such data, e.g., enabling greener and more efficient transportation based on traffic time series forecasting. The tutorial adopts the holistic paradigm of "data-governance-analytics-decision." We first introduce the data foundation of time series and spatio-temporal data, which is often heterogeneous. Next, we discuss data governance methods that aim to improve data quality. We then cover data analytics, focusing on five desired characteristics: automation, robustness, generality, explainability, and resource efficiency. We finally cover data-driven decision making strategies and briefly discuss promising research directions. We hope that the tutorial will serve as a primary resource for researchers and practitioners who are interested in value creation from time series and spatio-temporal data.
* This paper is accepted by ICDE 2025
Via

Mar 05, 2025
Abstract:While LLMs have demonstrated remarkable potential in time series forecasting, their practical deployment remains constrained by excessive computational demands and memory footprints. Existing LLM-based approaches typically suffer from three critical limitations: Inefficient parameter utilization in handling numerical time series patterns; Modality misalignment between continuous temporal signals and discrete text embeddings; and Inflexibility for real-time expert knowledge integration. We present SMETimes, the first systematic investigation of sub-3B parameter SLMs for efficient and accurate time series forecasting. Our approach centers on three key innovations: A statistically-enhanced prompting mechanism that bridges numerical time series with textual semantics through descriptive statistical features; A adaptive fusion embedding architecture that aligns temporal patterns with language model token spaces through learnable parameters; And a dynamic mixture-of-experts framework enabled by SLMs' computational efficiency, adaptively combining base predictions with domain-specific models. Extensive evaluations across seven benchmark datasets demonstrate that our 3B-parameter SLM achieves state-of-the-art performance on five primary datasets while maintaining 3.8x faster training and 5.2x lower memory consumption compared to 7B-parameter LLM baselines. Notably, the proposed model exhibits better learning capabilities, achieving 12.3% lower MSE than conventional LLM. Ablation studies validate that our statistical prompting and cross-modal fusion modules respectively contribute 15.7% and 18.2% error reduction in long-horizon forecasting tasks. By redefining the efficiency-accuracy trade-off landscape, this work establishes SLMs as viable alternatives to resource-intensive LLMs for practical time series forecasting. Code and models are available at https://github.com/xiyan1234567/SMETimes.
* Work in progress
Via

Mar 03, 2025
Abstract:The rise of foundation models has revolutionized natural language processing and computer vision, yet their best practices to time series forecasting remains underexplored. Existing time series foundation models often adopt methodologies from these fields without addressing the unique characteristics of time series data. In this paper, we identify two key challenges in cross-domain time series forecasting: the complexity of temporal patterns and semantic misalignment. To tackle these issues, we propose the ``Unify and Anchor" transfer paradigm, which disentangles frequency components for a unified perspective and incorporates external context as domain anchors for guided adaptation. Based on this framework, we introduce ContexTST, a Transformer-based model that employs a time series coordinator for structured representation and the Transformer blocks with a context-informed mixture-of-experts mechanism for effective cross-domain generalization. Extensive experiments demonstrate that ContexTST advances state-of-the-art forecasting performance while achieving strong zero-shot transferability across diverse domains.
* 20 pages, 12 figures, 8 tables, conference under review
Via

Mar 04, 2025
Abstract:Unlike traditional time-series forecasting methods that require extensive in-task data for training, zero-shot forecasting can directly predict future values given a target time series without additional training data. Current zero-shot approaches primarily rely on pre-trained generalized models, with their performance often depending on the variety and relevance of the pre-training data, which can raise privacy concerns. Instead of collecting diverse pre-training data, we introduce SeqFusion in this work, a novel framework that collects and fuses diverse pre-trained models (PTMs) sequentially for zero-shot forecasting. Based on the specific temporal characteristics of the target time series, SeqFusion selects the most suitable PTMs from a batch of pre-collected PTMs, performs sequential predictions, and fuses all the predictions while using minimal data to protect privacy. Each of these PTMs specializes in different temporal patterns and forecasting tasks, allowing SeqFusion to select by measuring distances in a shared representation space of the target time series with each PTM. Experiments demonstrate that SeqFusion achieves competitive accuracy in zero-shot forecasting compared to state-of-the-art methods.
Via

Mar 08, 2025
Abstract:Distributed photovoltaic (DPV) systems are essential for advancing renewable energy applications and achieving energy independence. Accurate DPV power forecasting can optimize power system planning and scheduling while significantly reducing energy loss, thus enhancing overall system efficiency and reliability. However, solar energy's intermittent nature and DPV systems' spatial distribution create significant forecasting challenges. Traditional methods often rely on costly external data, such as numerical weather prediction (NWP) and satellite images, which are difficult to scale for smaller DPV systems. To tackle this issue, this study has introduced an advanced large language model (LLM)-based time series forecasting framework Time-LLM to improve the DPV power forecasting accuracy and generalization ability. By reprogramming, the framework aligns historical power data with natural language modalities, facilitating efficient modeling of time-series data. Then Qwen2.5-3B model is integrated as the backbone LLM to process input data by leveraging its pattern recognition and inference abilities, achieving a balance between efficiency and performance. Finally, by using a flatten and linear projection layer, the LLM's high-dimensional output is transformed into the final forecasts. Experimental results indicate that Time-LLM outperforms leading recent advanced time series forecasting models, such as Transformer-based methods and MLP-based models, achieving superior accuracy in both short-term and long-term forecasting. Time-LLM also demonstrates exceptional adaptability in few-shot and zero-shot learning scenarios. To the best of the authors' knowledge, this study is the first attempt to explore the application of LLMs to DPV power forecasting, which can offer a scalable solution that eliminates reliance on costly external data sources and improve real-world forecasting accuracy.
* 23 pages, 8 figures
Via

Mar 05, 2025
Abstract:Detecting anomalies in time series data is a critical task across many domains. The challenge intensifies when anomalies are sparse and the data are multivariate with relational dependencies across sensors or nodes. Traditional univariate anomaly detectors struggle to capture such cross-node dependencies, particularly in sparse anomaly settings. To address this, we propose a graph-augmented time series forecasting approach that explicitly integrates the graph of relationships among time series into an LSTM forecasting model. This enables the model to detect rare anomalies that might otherwise go unnoticed in purely univariate approaches. We evaluate the approach on two benchmark datasets - the Yahoo Webscope S5 anomaly dataset and the METR-LA traffic sensor network - and compare the performance of the Graph-Augmented LSTM against LSTM-only, ARIMA, and Prophet baselines. Results demonstrate that the graph-augmented model achieves significantly higher precision and recall, improving F1-score by up to 10% over the best baseline
* 12 pages
Via
