Alert button
Picture for Danielle C. Maddix

Danielle C. Maddix

Alert button

Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

May 26, 2023
Hilaf Hasson, Danielle C. Maddix, Yuyang Wang, Gaurav Gupta, Youngsuk Park

Figure 1 for Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Figure 2 for Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Figure 3 for Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Figure 4 for Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

* ICML 2023 
Viaarxiv icon

Learning Physical Models that Can Respect Conservation Laws

Feb 21, 2023
Derek Hansen, Danielle C. Maddix, Shima Alizadeh, Gaurav Gupta, Michael W. Mahoney

Figure 1 for Learning Physical Models that Can Respect Conservation Laws
Figure 2 for Learning Physical Models that Can Respect Conservation Laws
Figure 3 for Learning Physical Models that Can Respect Conservation Laws
Figure 4 for Learning Physical Models that Can Respect Conservation Laws

Recent work in scientific machine learning (SciML) has focused on incorporating partial differential equation (PDE) information into the learning process. Much of this work has focused on relatively ``easy'' PDE operators (e.g., elliptic and parabolic), with less emphasis on relatively ``hard'' PDE operators (e.g., hyperbolic). Within numerical PDEs, the latter problem class requires control of a type of volume element or conservation constraint, which is known to be challenging. Delivering on the promise of SciML requires seamlessly incorporating both types of problems into the learning process. To address this issue, we propose ProbConserv, a framework for incorporating conservation constraints into a generic SciML architecture. To do so, ProbConserv combines the integral form of a conservation law with a Bayesian update. We provide a detailed analysis of ProbConserv on learning with the Generalized Porous Medium Equation (GPME), a widely-applicable parameterized family of PDEs that illustrates the qualitative properties of both easier and harder PDEs. ProbConserv is effective for easy GPME variants, performing well with state-of-the-art competitors; and for harder GPME variants it outperforms other approaches that do not guarantee volume conservation. ProbConserv seamlessly enforces physical conservation constraints, maintains probabilistic uncertainty quantification (UQ), and deals well with shocks and heteroscedasticities. In each case, it achieves superior predictive performance on downstream tasks.

Viaarxiv icon

Cross-Frequency Time Series Meta-Forecasting

Feb 04, 2023
Mike Van Ness, Huibin Shen, Hao Wang, Xiaoyong Jin, Danielle C. Maddix, Karthick Gopalswamy

Figure 1 for Cross-Frequency Time Series Meta-Forecasting
Figure 2 for Cross-Frequency Time Series Meta-Forecasting
Figure 3 for Cross-Frequency Time Series Meta-Forecasting
Figure 4 for Cross-Frequency Time Series Meta-Forecasting

Meta-forecasting is a newly emerging field which combines meta-learning and time series forecasting. The goal of meta-forecasting is to train over a collection of source time series and generalize to new time series one-at-a-time. Previous approaches in meta-forecasting achieve competitive performance, but with the restriction of training a separate model for each sampling frequency. In this work, we investigate meta-forecasting over different sampling frequencies, and introduce a new model, the Continuous Frequency Adapter (CFA), specifically designed to learn frequency-invariant representations. We find that CFA greatly improves performance when generalizing to unseen frequencies, providing a first step towards forecasting over larger multi-frequency datasets.

Viaarxiv icon

First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting

Dec 15, 2022
Xiyuan Zhang, Xiaoyong Jin, Karthick Gopalswamy, Gaurav Gupta, Youngsuk Park, Xingjian Shi, Hao Wang, Danielle C. Maddix, Yuyang Wang

Figure 1 for First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting
Figure 2 for First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting
Figure 3 for First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting
Figure 4 for First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting

Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.

* NeurIPS 2022 All Things Attention Workshop 
Viaarxiv icon

Guiding continuous operator learning through Physics-based boundary constraints

Dec 14, 2022
Nadim Saad, Gaurav Gupta, Shima Alizadeh, Danielle C. Maddix

Figure 1 for Guiding continuous operator learning through Physics-based boundary constraints
Figure 2 for Guiding continuous operator learning through Physics-based boundary constraints
Figure 3 for Guiding continuous operator learning through Physics-based boundary constraints
Figure 4 for Guiding continuous operator learning through Physics-based boundary constraints

Boundary conditions (BCs) are important groups of physics-enforced constraints that are necessary for solutions of Partial Differential Equations (PDEs) to satisfy at specific spatial locations. These constraints carry important physical meaning, and guarantee the existence and the uniqueness of the PDE solution. Current neural-network based approaches that aim to solve PDEs rely only on training data to help the model learn BCs implicitly. There is no guarantee of BC satisfaction by these models during evaluation. In this work, we propose Boundary enforcing Operator Network (BOON) that enables the BC satisfaction of neural operators by making structural changes to the operator kernel. We provide our refinement procedure, and demonstrate the satisfaction of physics-based BCs, e.g. Dirichlet, Neumann, and periodic by the solutions obtained by BOON. Numerical experiments based on multiple PDEs with a wide variety of applications indicate that the proposed approach ensures satisfaction of BCs, and leads to more accurate solutions over the entire domain. The proposed correction method exhibits a (2X-20X) improvement over a given operator model in relative $L^2$ error (0.000084 relative $L^2$ error for Burgers' equation).

* Nadim and Gaurav contributed equally in this work. 31 pages, 7 figures, 16 tables 
Viaarxiv icon

Attention-based Domain Adaptation for Time Series Forecasting

Feb 17, 2021
Xiaoyong Jin, Youngsuk Park, Danielle C. Maddix, Yuyang Wang, Xifeng Yan

Figure 1 for Attention-based Domain Adaptation for Time Series Forecasting
Figure 2 for Attention-based Domain Adaptation for Time Series Forecasting
Figure 3 for Attention-based Domain Adaptation for Time Series Forecasting
Figure 4 for Attention-based Domain Adaptation for Time Series Forecasting

Recent years have witnessed deep neural networks gaining increasing popularity in the field of time series forecasting. A primary reason of their success is their ability to effectively capture complex temporal dynamics across multiple related time series. However, the advantages of these deep forecasters only start to emerge in the presence of a sufficient amount of data. This poses a challenge for typical forecasting problems in practice, where one either has a small number of time series, or limited observations per time series, or both. To cope with the issue of data scarcity, we propose a novel domain adaptation framework, Domain Adaptation Forecaster (DAF), that leverages the statistical strengths from another relevant domain with abundant data samples (source) to improve the performance on the domain of interest with limited data (target). In particular, we propose an attention-based shared module with a domain discriminator across domains as well as private modules for individual domains. This allows us to jointly train the source and target domains by generating domain-invariant latent features while retraining domain-specific features. Extensive experiments on various domains demonstrate that our proposed method outperforms state-of-the-art baselines on synthetic and real-world datasets.

* 15 pages, 9 figures 
Viaarxiv icon

GluonTS: Probabilistic Time Series Models in Python

Jun 14, 2019
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, Yuyang Wang

Figure 1 for GluonTS: Probabilistic Time Series Models in Python
Figure 2 for GluonTS: Probabilistic Time Series Models in Python
Figure 3 for GluonTS: Probabilistic Time Series Models in Python
Figure 4 for GluonTS: Probabilistic Time Series Models in Python

We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy.

* ICML Time Series Workshop 2019 
Viaarxiv icon

Deep Factors for Forecasting

May 28, 2019
Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski

Figure 1 for Deep Factors for Forecasting
Figure 2 for Deep Factors for Forecasting
Figure 3 for Deep Factors for Forecasting
Figure 4 for Deep Factors for Forecasting

Producing probabilistic forecasts for large collections of similar and/or dependent time series is a practically relevant and challenging task. Classical time series models fail to capture complex patterns in the data, and multivariate techniques struggle to scale to large problem sizes. Their reliance on strong structural assumptions makes them data-efficient, and allows them to provide uncertainty estimates. The converse is true for models based on deep neural networks, which can learn complex patterns and dependencies given enough data. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is data-driven and scalable via a latent, global, deep component. It also handles uncertainty through a local classical model. We provide both theoretical and empirical evidence for the soundness of our approach through a necessary and sufficient decomposition of exchangeable time series into a global and a local part. Our experiments demonstrate the advantages of our model both in term of data efficiency, accuracy and computational complexity.

* Proceedings of Machine Learning Research, Volume 97: International Conference on Machine Learning, 2019  
* http://proceedings.mlr.press/v97/wang19k/wang19k.pdf. arXiv admin note: substantial text overlap with arXiv:1812.00098 
Viaarxiv icon

Deep Factors with Gaussian Processes for Forecasting

Nov 30, 2018
Danielle C. Maddix, Yuyang Wang, Alex Smola

Figure 1 for Deep Factors with Gaussian Processes for Forecasting
Figure 2 for Deep Factors with Gaussian Processes for Forecasting
Figure 3 for Deep Factors with Gaussian Processes for Forecasting

A large collection of time series poses significant challenges for classical and neural forecasting approaches. Classical time series models fail to fit data well and to scale to large problems, but succeed at providing uncertainty estimates. The converse is true for deep neural networks. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is data-driven and scalable via a latent, global, deep component. It also handles uncertainty through a local classical Gaussian Process model. Our experiments demonstrate that our method obtains higher accuracy than state-of-the-art methods.

* Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal, Canada 
Viaarxiv icon