Abstract:Reliable wall-to-wall biomass mapping from NASA's GEDI mission requires interpolating sparse LiDAR observations across heterogeneous landscapes. While machine learning approaches like Random Forest and XGBoost are standard for this task, they treat spatial predictions of GEDI observations from multispectral or SAR remote sensing data as independent without adapting to the varying difficulty of heterogeneous landscapes. We demonstrate these approaches generally fail to produce calibrated prediction intervals. We identify that this stems from conflating ensemble variance with aleatoric uncertainty and ignoring local spatial context. To resolve this, we introduce Attentive Neural Processes (ANPs), a probabilistic meta-learning framework that explicitly conditions predictions on local observation sets and geospatial foundation model embeddings. Unlike static ensembles, ANPs learn a flexible spatial covariance function, allowing uncertainty estimates to expand in complex landscapes and contract in homogeneous areas. We validate this approach across five distinct biomes ranging from Tropical Amazonian forests to Boreal and Alpine ecosystems, demonstrating that ANPs achieve competitive accuracy while maintaining near-ideal uncertainty calibration. We demonstrate the operational utility of the method through few-shot adaptation, where the model recovers most of the performance gap in cross-region transfer using minimal local data. This work provides a scalable, theoretically rigorous alternative to ensemble variance for continental scale earth observation.
Abstract:"First, do no harm" faces a fundamental challenge in artificial intelligence: how can we specify what constitutes harm? While prior work treats harm specification as a technical hurdle to be overcome through better algorithms or more data, we argue this assumption is unsound. Drawing on information theory, we demonstrate that complete harm specification is fundamentally impossible for any system where harm is defined external to its specifications. This impossibility arises from an inescapable information-theoretic gap: the entropy of harm H(O) always exceeds the mutual information I(O;I) between ground truth harm O and a system's specifications I. We introduce two novel metrics: semantic entropy H(S) and the safety-capability ratio I(O;I)/H(O), to quantify these limitations. Through a progression of increasingly sophisticated specification attempts, we show why each approach must fail and why the resulting gaps are not mere engineering challenges but fundamental constraints akin to the halting problem. These results suggest a paradigm shift: rather than pursuing complete specifications, AI alignment research should focus on developing systems that can operate safely despite irreducible specification uncertainty.
Abstract:Modern language models paradoxically combine unprecedented capability with persistent vulnerability in that they can draft poetry yet cannot reliably refuse harmful requests. We reveal this fragility stems not from inadequate training, but from a fundamental architectural limitation: transformers process all tokens as equals. Transformers operate as computational democracies, granting equal voice to all tokens. This is a design tragically unsuited for AGI, where we cannot risk adversarial "candidates" hijacking the system. Through formal analysis, we demonstrate that safety instructions fundamentally lack privileged status in transformer architectures, that they compete with adversarial inputs in the same computational arena, making robust alignment through prompting or fine-tuning inherently limited. This "token democracy" explains why jailbreaks bypass even extensively safety-trained models and why positional shifts erode prompt effectiveness. Our work systematizes practitioners' tacit knowledge into an architectural critique, showing current alignment approaches create mere preferences, not constraints.
Abstract:Who controls the development of Artificial General Intelligence (AGI) might matter less than how we handle the fight for control itself. We formalize this "steering wheel problem" as humanity's greatest near-term existential risk may stem not from misaligned AGI, but from the dynamics of competing to develop it. Just as a car crash can occur from passengers fighting over the wheel before reaching any destination, catastrophic outcomes could arise from development competition long before AGI exists. While technical alignment research focuses on ensuring safe arrival, we show how coordination failures during development could drive us off the cliff first. We present a game theoretic framework modeling AGI development dynamics and prove conditions for sustainable cooperative equilibria. Drawing from nuclear control while accounting for AGI's unique characteristics, we propose concrete mechanisms including pre-registration, shared technical infrastructure, and automated deterrence to stabilize cooperation. Our key insight is that AGI creates network effects in safety: shared investments become more valuable as participation grows, enabling mechanism designs where cooperation dominates defection. This work bridges formal methodology and policy frameworks, providing foundations for practical governance of AGI competition risks.