Machine learning techniques are now integral to the advancement of intelligent urban services, playing a crucial role in elevating the efficiency, sustainability, and livability of urban environments. The recent emergence of foundation models such as ChatGPT marks a revolutionary shift in the fields of machine learning and artificial intelligence. Their unparalleled capabilities in contextual understanding, problem solving, and adaptability across a wide range of tasks suggest that integrating these models into urban domains could have a transformative impact on the development of smart cities. Despite growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces challenges such as a lack of clear definitions, systematic reviews, and universalizable solutions. To this end, this paper first introduces the concept of UFM and discusses the unique challenges involved in building them. We then propose a data-centric taxonomy that categorizes current UFM-related works, based on urban data modalities and types. Furthermore, to foster advancement in this field, we present a promising framework aimed at the prospective realization of UFMs, designed to overcome the identified challenges. Additionally, we explore the application landscape of UFMs, detailing their potential impact in various urban contexts. Relevant papers and open-source resources have been collated and are continuously updated at https://github.com/usail-hkust/Awesome-Urban-Foundation-Models.
The increasing air pollution poses an urgent global concern with far-reaching consequences, such as premature mortality and reduced crop yield, which significantly impact various aspects of our daily lives. Accurate and timely analysis of air pollution is crucial for understanding its underlying mechanisms and implementing necessary precautions to mitigate potential socio-economic losses. Traditional analytical methodologies, such as atmospheric modeling, heavily rely on domain expertise and often make simplified assumptions that may not be applicable to complex air pollution problems. In contrast, Machine Learning (ML) models are able to capture the intrinsic physical and chemical rules by automatically learning from a large amount of historical observational data, showing great promise in various air quality analytical tasks. In this article, we present a comprehensive survey of ML-based air quality analytics, following a roadmap spanning from data acquisition to pre-processing, and encompassing various analytical tasks such as pollution pattern mining, air quality inference, and forecasting. Moreover, we offer a systematic categorization and summary of existing methodologies and applications, while also providing a list of publicly available air quality datasets to ease the research in this direction. Finally, we identify several promising future research directions. This survey can serve as a valuable resource for professionals seeking suitable solutions for their specific challenges and advancing their research at the cutting edge.
Accurate traffic forecasting at intersections governed by intelligent traffic signals is critical for the advancement of an effective intelligent traffic signal control system. However, due to the irregular traffic time series produced by intelligent intersections, the traffic forecasting task becomes much more intractable and imposes three major new challenges: 1) asynchronous spatial dependency, 2) irregular temporal dependency among traffic data, and 3) variable-length sequence to be predicted, which severely impede the performance of current traffic forecasting methods. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic states of the lanes entering intelligent intersections in a future time window. Specifically, by linking lanes via a traffic diffusion graph, we first propose an Asynchronous Graph Diffusion Network to model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes. After that, to capture the temporal dependency within irregular traffic state sequence, a learnable personalized time encoding is devised to embed the continuous time for each lane. Then we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.
Accurate traffic prediction benefits urban management and improves transportation efficiency. Recently, data-driven methods have been widely applied in traffic prediction and outperformed traditional methods. However, data-driven methods normally require massive data for training, while data scarcity is ubiquitous in low-developmental or newly constructed regions. To tackle this problem, we can extract meta knowledge from data-rich cities to data-scarce cities via transfer learning. Besides, relations among urban regions can be organized into various semantic graphs, e.g. proximity and POI similarity, which is barely considered in previous studies. In this paper, we propose Semantic-Fused Hierarchical Graph Transfer Learning (SF-HGTL) model to achieve knowledge transfer across cities with fused semantics. In detail, we employ hierarchical graph transformation followed by meta-knowledge retrieval to achieve knowledge transfer in various granularity. In addition, we introduce meta semantic nodes to reduce the number of parameters as well as share information across semantics. Afterwards, the parameters of the base model are generated by fused semantic embeddings to predict traffic status in terms of task heterogeneity. We implement experiments on five real-world datasets and verify the effectiveness of our SF-HGTL model by comparing it with other baselines.
Accurate and timely air quality and weather predictions are of great importance to urban governance and human livelihood. Though many efforts have been made for air quality or weather prediction, most of them simply employ one another as feature input, which ignores the inner-connection between two predictive tasks. On the one hand, the accurate prediction of one task can help improve another task's performance. On the other hand, geospatially distributed air quality and weather monitoring stations provide additional hints for city-wide spatiotemporal dependency modeling. Inspired by the above two insights, in this paper, we propose the Multi-adversarial spatiotemporal recurrent Graph Neural Networks (MasterGNN) for joint air quality and weather predictions. Specifically, we first propose a heterogeneous recurrent graph neural network to model the spatiotemporal autocorrelation among air quality and weather monitoring stations. Then, we develop a multi-adversarial graph learning framework to against observation noise propagation introduced by spatiotemporal modeling. Moreover, we present an adaptive training strategy by formulating multi-adversarial learning as a multi-task learning problem. Finally, extensive experiments on two real-world datasets show that MasterGNN achieves the best performance compared with seven baselines on both air quality and weather prediction tasks.
Wearable computing and context awareness are the focuses of study in the field of artificial intelligence recently. One of the most appealing as well as challenging applications is the Human Activity Recognition (HAR) utilizing smart phones. Conventional HAR based on Support Vector Machine relies on subjective manually extracted features. This approach is time and energy consuming as well as immature in prediction due to the partial view toward which features to be extracted by human. With the rise of deep learning, artificial intelligence has been making progress toward being a mature technology. This paper proposes a new approach based on deep learning and traditional feature engineering called HAR-Net to address the issue related to HAR. The study used the data collected by gyroscopes and acceleration sensors in android smart phones. The raw sensor data was put into the HAR-Net proposed. The HAR-Net fusing the hand-crafted features and high-level features extracted from convolutional network to make prediction. The performance of the proposed method was proved to be 0.9% higher than the original MC-SVM approach. The experimental results on the UCI dataset demonstrate that fusing the two kinds of features can make up for the shortage of traditional feature engineering and deep learning techniques.