Abstract:Traditional time series forecasting methods optimize for accuracy alone. This objective neglects temporal consistency, in other words, how consistently a model predicts the same future event as the forecast origin changes. We introduce the forecast accuracy and coherence score (forecast AC score for short) for measuring the quality of probabilistic multi-horizon forecasts in a way that accounts for both multi-horizon accuracy and stability. Our score additionally provides for user-specified weights to balance accuracy and consistency requirements. As an example application, we implement the score as a differentiable objective function for training seasonal ARIMA models and evaluate it on the M4 Hourly benchmark dataset. Results demonstrate substantial improvements over traditional maximum likelihood estimation. Our AC-optimized models achieve a 75\% reduction in forecast volatility for the same target timestamps while maintaining comparable or improved point forecast accuracy.
Abstract:Energy demand prediction is critical for grid operators, industrial energy consumers, and service providers. Energy demand is influenced by multiple factors, including weather conditions (e.g. temperature, humidity, wind speed, solar radiation), and calendar information (e.g. hour of day and month of year), which further affect daily work and life schedules. These factors are causally interdependent, making the problem more complex than simple correlation-based learning techniques satisfactorily allow for. We propose a structural causal model that explains the causal relationship between these variables. A full analysis is performed to validate our causal beliefs, also revealing important insights consistent with prior studies. For example, our causal model reveals that energy demand responds to temperature fluctuations with season-dependent sensitivity. Additionally, we find that energy demand exhibits lower variance in winter due to the decoupling effect between temperature changes and daily activity patterns. We then build a Bayesian model, which takes advantage of the causal insights we learned as prior knowledge. The model is trained and tested on unseen data and yields state-of-the-art performance in the form of a 3.84 percent MAPE on the test set. The model also demonstrates strong robustness, as the cross-validation across two years of data yields an average MAPE of 3.88 percent.