Abstract:Foundation models have revolutionized fields such as natural language processing and computer vision by enabling general-purpose learning across diverse tasks and datasets. However, building analogous models for human mobility remains challenging due to the privacy-sensitive nature of mobility data and the resulting data silos across institutions. To bridge this gap, we propose MoveGCL, a scalable and privacy-preserving framework for training mobility foundation models via generative continual learning. Without sharing raw data, MoveGCL enables decentralized and progressive model evolution by replaying synthetic trajectories generated from a frozen teacher model, and reinforces knowledge retention through a tailored distillation strategy that mitigates catastrophic forgetting. To address the heterogeneity of mobility patterns, MoveGCL incorporates a Mixture-of-Experts Transformer with a mobility-aware expert routing mechanism, and employs a layer-wise progressive adaptation strategy to stabilize continual updates. Experiments on six real-world urban datasets demonstrate that MoveGCL achieves performance comparable to joint training and significantly outperforms federated learning baselines, while offering strong privacy protection. MoveGCL marks a crucial step toward unlocking foundation models for mobility, offering a practical blueprint for open, scalable, and privacy-preserving model development in the era of foundation models.
Abstract:Modeling human mobility across diverse cities is essential for applications such as urban planning, transportation optimization, and personalized services. However, generalization remains challenging due to heterogeneous spatial representations and mobility patterns across cities. Existing methods typically rely on numerical coordinates or require training city-specific models, limiting their scalability and transferability. We propose TrajMoE, a unified and scalable model for cross-city human mobility modeling. TrajMoE addresses two key challenges: (1) inconsistent spatial semantics across cities, and (2) diverse urban mobility patterns. To tackle these, we begin by designing a spatial semantic encoder that learns transferable location representations from POI-based functional semantics and visit patterns. Furthermore, we design a Spatially-Aware Mixture-of-Experts (SAMoE) Transformer that injects structured priors into experts specialized in distinct mobility semantics, along with a shared expert to capture city-invariant patterns and enable adaptive cross-city generalization. Extensive experiments demonstrate that TrajMoE achieves up to 27% relative improvement over competitive mobility foundation models after only one epoch of fine-tuning, and consistently outperforms full-data baselines using merely 5% of target city data. These results establish TrajMoE as a significant step toward realizing a truly generalizable, transferable, and pretrainable foundation model for human mobility.
Abstract:Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.
Abstract:The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scale up diffusion transformers in this field. UrbanDiT pioneers a unified model that integrates diverse spatio-temporal data sources and types while learning universal spatio-temporal patterns across different cities and scenarios. This allows the model to unify both multi-data and multi-task learning, and effectively support a wide range of spatio-temporal applications. Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts, guiding the model to deliver superior performance across various urban applications. UrbanDiT offers three primary advantages: 1) It unifies diverse data types, such as grid-based and graph-based data, into a sequential format, allowing to capture spatio-temporal dynamics across diverse scenarios of different cities; 2) With masking strategies and task-specific prompts, it supports a wide range of tasks, including bi-directional spatio-temporal prediction, temporal interpolation, spatial extrapolation, and spatio-temporal imputation; and 3) It generalizes effectively to open-world scenarios, with its powerful zero-shot capabilities outperforming nearly all baselines with training data. These features allow UrbanDiT to achieves state-of-the-art performance in different domains such as transportation traffic, crowd flows, taxi demand, bike usage, and cellular traffic, across multiple cities and tasks. UrbanDiT sets up a new benchmark for foundation models in the urban spatio-temporal domain.