Abstract:Distributed fiber-optic acoustic sensing (DAS) has emerged as a transformative approach for distributed vibration measurement with high spatial resolution and long measurement range while maintaining cost-efficiency. However, the two-dimensional spatial-temporal DAS signals present analytical challenges. The abstract signal morphology lacking intuitive physical correspondence complicates human interpretation, and its unique spatial-temporal coupling renders conventional image processing methods suboptimal. This study investigates spatial-temporal characteristics and proposes a self-supervised pre-training framework that learns signals' representations through a mask-reconstruction task. This framework is named the DAS Masked AutoEncoder (DAS-MAE). The DAS-MAE learns high-level representations (e.g., event class) without using labels. It achieves up to 1% error and 64.5% relative improvement (RI) over the semi-supervised baseline in few-shot classification tasks. In a practical external damage prevention application, DAS-MAE attains a 5.0% recognition error, marking a 75.7% RI over supervised training from scratch. These results demonstrate the high-performance and universal representations learned by the DAS-MAE framework, highlighting its potential as a foundation model for analyzing massive unlabeled DAS signals.