To accommodate the unprecedented increase of commercial airlines over the next ten years, the Next Generation Air Transportation System (NextGen) has been implemented in the USA that records large-scale Air Traffic Management (ATM) data to make air travel safer, more efficient, and more economical. A key role of collaborative decision making for air traffic scheduling and airspace resource management is the accurate prediction of flight delay. There has been a lot of attempts to apply data-driven methods such as machine learning to forecast flight delay situation using air traffic data of departures and arrivals. However, most of them omit en-route spatial information of airlines and temporal correlation between serial flights which results in inaccuracy prediction. In this paper, we present a novel aviation delay prediction system based on stacked Long Short-Term Memory (LSTM) networks for commercial flights. The system learns from historical trajectories from automatic dependent surveillance-broadcast (ADS-B) messages and uses the correlative geolocations to collect indispensable features such as climatic elements, air traffic, airspace, and human factors data along posterior routes. These features are integrated and then are fed into our proposed regression model. The latent spatio-temporal patterns of data are abstracted and learned in the LSTM architecture. Compared with previous schemes, our approach is demonstrated to be more robust and accurate for large hub airports.