Urban Resilience.AI Lab, Zachry Department of Civil and Environmental Engineering, Texas A&M University, College Station, United States
Abstract:Existing Text-to-SQL benchmarks primarily focus on single-table queries or limited joins in general-purpose domains, and thus fail to reflect the complexity of domain-specific, multi-table and geospatial reasoning, To address this limitation, we introduce FLOODSQL-BENCH, a geospatially grounded benchmark for the flood management domain that integrates heterogeneous datasets through key-based, spatial, and hybrid joins. The benchmark captures realistic flood-related information needs by combining social, infrastructural, and hazard data layers. We systematically evaluate recent large language models with the same retrieval-augmented generation settings and measure their performance across difficulty tiers. By providing a unified, open benchmark grounded in real-world disaster management data, FLOODSQL-BENCH establishes a practical testbed for advancing Text-to-SQL research in high-stakes application domains.
Abstract:This study investigates the interplay among social demographics, built environment characteristics, and environmental hazard exposure features in determining community level cancer prevalence. Utilizing data from five Metropolitan Statistical Areas in the United States: Chicago, Dallas, Houston, Los Angeles, and New York, the study implemented an XGBoost machine learning model to predict the extent of cancer prevalence and evaluate the importance of different features. Our model demonstrates reliable performance, with results indicating that age, minority status, and population density are among the most influential factors in cancer prevalence. We further explore urban development and design strategies that could mitigate cancer prevalence, focusing on green space, developed areas, and total emissions. Through a series of experimental evaluations based on causal inference, the results show that increasing green space and reducing developed areas and total emissions could alleviate cancer prevalence. The study and findings contribute to a better understanding of the interplay among urban features and community health and also show the value of interpretable machine learning models for integrated urban design to promote public health. The findings also provide actionable insights for urban planning and design, emphasizing the need for a multifaceted approach to addressing urban health disparities through integrated urban design strategies.




Abstract:Understanding the determinants underlying variations in urban health status is important for informing urban design and planning, as well as public health policies. Multiple heterogeneous urban features could modulate the prevalence of diseases across different neighborhoods in cities and across different cities. This study examines heterogeneous features related to socio-demographics, population activity, mobility, and the built environment and their non-linear interactions to examine intra- and inter-city disparity in prevalence of four disease types: obesity, diabetes, cancer, and heart disease. Features related to population activity, mobility, and facility density are obtained from large-scale anonymized mobility data. These features are used in training and testing graph attention network (GAT) models to capture non-linear feature interactions as well as spatial interdependence among neighborhoods. We tested the models in five U.S. cities across the four disease types. The results show that the GAT model can predict the health status of people in neighborhoods based on the top five determinant features. The findings unveil that population activity and built-environment features along with socio-demographic features differentiate the health status of neighborhoods to such a great extent that a GAT model could predict the health status using these features with high accuracy. The results also show that the model trained on one city can predict health status in another city with high accuracy, allowing us to quantify the inter-city similarity and discrepancy in health status. The model and findings provide novel approaches and insights for urban designers, planners, and public health officials to better understand and improve health disparities in cities by considering the significant determinant features and their interactions.