Abstract:The development of accurate forecasts of solar eruptive activity has become increasingly important for preventing potential impacts on space technologies and exploration. Therefore, it is crucial to detect Active Regions (ARs) before they start forming on the solar surface. This will enable the development of early-warning capabilities for upcoming space weather disturbances. For this reason, we prepared the Solar Active Region Emergence Dataset (SolARED). The dataset is derived from full-disk maps of the Doppler velocity, magnetic field, and continuum intensity, obtained by the Helioseismic and Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO). SolARED includes time series of remapped, tracked, and binned data that characterize the evolution of acoustic power of solar oscillations, unsigned magnetic flux, and continuum intensity for 50 large ARs before, during, and after their emergence on the solar surface, as well as surrounding areas observed on the solar disc between 2010 and 2023. The resulting ML-ready SolARED dataset is designed to support enhancements of predictive capabilities, enabling the development of operational forecasts for the emergence of active regions. The SolARED dataset is available at https://sun.njit.edu/sarportal/, through an interactive visualization web application.
Abstract:Early and accurate prediction of solar active region (AR) emergence is crucial for space weather forecasting. Building on established Long Short-Term Memory (LSTM) based approaches for forecasting the continuum intensity decrease associated with AR emergence, this work expands the modeling with new architectures and targets. We investigate a sliding-window Transformer architecture to forecast continuum intensity evolution up to 12 hours ahead using data from 46 ARs observed by SDO/HMI. We conduct a systematic ablation study to evaluate two key components: (1) the inclusion of a temporal 1D convolutional (Conv1D) front-end and (2) a novel `Early Detection' architecture featuring attention biases and a timing-aware loss function. Our best-performing model, combining the Early Detection architecture without the Conv1D layer, achieved a Root Mean Square Error (RMSE) of 0.1189 (representing a 10.6% improvement over the LSTM baseline) and an average advance warning time of 4.73 hours (timing difference of -4.73h), even under a stricter emergence criterion than previous studies. While the Transformer demonstrates superior aggregate timing and accuracy, we note that this high-sensitivity detection comes with increased variance compared to smoother baseline models. However, this volatility is a necessary trade-off for operational warning systems: the model's ability to detect micro-changes in precursor signals enables significantly earlier detection, outweighing the cost of increased noise. Our results demonstrate that Transformer architectures modified with early detection biases, when used without temporal smoothing layers, provide a high-sensitivity alternative for forecasting AR emergence that prioritizes advance warning over statistical smoothness.