In Low Earth Orbit (LEO) satellite networks, Beam Hopping (BH) technology enables the efficient utilization of limited radio resources by adapting to varying user demands and link conditions. Effective BH planning requires prior knowledge of upcoming traffic at the time of scheduling, making forecasting an important sub-task. Forecasting becomes particularly critical under heavy load conditions where an unexpected demand burst combined with link degradation may cause buffer overflows and packet loss. To address this challenge, we propose a burst aware forecasting solution. This challenge may arise in a wide range of wireless networks; therefore, the proposed solution is broadly applicable to settings characterized by bursty traffic patterns where accurate demand forecasting is essential. Our approach introduces three key enhancements to a transformer architecture: (i) a distance from the last burst embedding to capture burst proximity, (ii) two additional linear layers in the decoder to forecast both upcoming bursts and their relative impact, and (iii) use of an asymmetric cost function during model training to better capture burst dynamics. Empirical evaluations in an Earth-fixed cell under high-traffic demand scenario demonstrate that the proposed model reduces prediction error by up to 94% at a one-step horizon and maintains the ability to accurately capture bursts even near the end of longer prediction horizons following Mean Square Error (MSE) metric.