Abstract:During the 20th Century, aerial surveys captured hundreds of millions of high-resolution photographs of the earth's surface. These images, the precursors to modern satellite imagery, represent an extraordinary visual record of the environmental and social upheavals of the 20th Century. However, most of these images currently languish in physical archives where retrieval is difficult and costly. Digitization could revolutionize access, but manual scanning is slow and expensive. Here, we describe and validate a novel robot-assisted pipeline that increases worker productivity in scanning 30-fold, applied at scale to digitize an archive of 1.7 million historical aerial photographs from 65 countries.
Abstract:Combining satellite imagery with machine learning (SIML) has the potential to address global challenges by remotely estimating socioeconomic and environmental conditions in data-poor regions, yet the resource requirements of SIML limit its accessibility and use. We show that a single encoding of satellite imagery can generalize across diverse prediction tasks (e.g. forest cover, house price, road length). Our method achieves accuracy competitive with deep neural networks at orders of magnitude lower computational cost, scales globally, delivers label super-resolution predictions, and facilitates characterizations of uncertainty. Since image encodings are shared across tasks, they can be centrally computed and distributed to unlimited researchers, who need only fit a linear regression to their own ground truth data in order to achieve state-of-the-art SIML performance.
Abstract:We propose a simple cross-sectional research design to identify causal effects that is robust to unobservable heterogeneity. When many observational units are adjacent, it may be sufficient to regress the "spatial first differences" (SFD) of the outcome on the treatment and omit all covariates. This approach is conceptually similar to first differencing approaches in time-series or panel models, except the index for time is replaced with an index for locations in space. The SFD approach identifies plausibly causal effects so long as local changes in the treatment and unobservable confounders are not systematically correlated between immediately adjacent neighbors. We illustrate how this approach can mitigate omitted variables bias through simulation and by estimating returns to schooling along 10th Avenue in New York and I-90 in Chicago. We then more fully explore the benefits of this approach by estimating effects of climate and soil on maize yields across US counties. In each case, we demonstrate the performance of the research design by withholding important covariates during estimation. SFD has multiple appealing features, such as internal robustness checks that exploit rotation of the coordinate system or double-differencing across space, it is immediately applicable to spatially-gridded data sets, and it can be easily implemented in statical packages by replacing a single index in pre-existing time-series functions.