Many ensemble methods encourage their constituent models to be diverse, because ensembling provides no benefits when models are identical. Most methods define diversity in terms of differences in training set predictions. In this paper, however, we demonstrate that diversity in training set predictions does not necessarily imply diversity when extrapolating even slightly outside it (which can affect generalization). To address this issue, we introduce a new diversity metric and associated method of training ensembles of models that extrapolate differently on local patches of the data manifold. Across a variety of synthetic and real-world tasks, we find that our method improves generalization and diversity in qualitatively novel ways, especially under data limits and covariate shift.