Abstract:Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.