Abstract:Existing benchmarks for Earth science multimodal learning exhibit critical limitations in systematic coverage of geosystem components and cross-sphere interactions, often constrained to isolated subsystems (only in Human-activities sphere or atmosphere) with limited evaluation dimensions (less than 16 tasks). To address these gaps, we introduce OmniEarth-Bench, the first comprehensive multimodal benchmark spanning all six Earth science spheres (atmosphere, lithosphere, Oceansphere, cryosphere, biosphere and Human-activities sphere) and cross-spheres with one hundred expert-curated evaluation dimensions. Leveraging observational data from satellite sensors and in-situ measurements, OmniEarth-Bench integrates 29,779 annotations across four tiers: perception, general reasoning, scientific knowledge reasoning and chain-of-thought (CoT) reasoning. This involves the efforts of 2-5 experts per sphere to establish authoritative evaluation dimensions and curate relevant observational datasets, 40 crowd-sourcing annotators to assist experts for annotations, and finally, OmniEarth-Bench is validated via hybrid expert-crowd workflows to reduce label ambiguity. Experiments on 9 state-of-the-art MLLMs reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35\% accuracy. Especially, in some cross-spheres tasks, the performance of leading models like GPT-4o drops to 0.0\%. OmniEarth-Bench sets a new standard for geosystem-aware AI, advancing both scientific discovery and practical applications in environmental monitoring and disaster prediction. The dataset, source code, and trained models were released.
Abstract:Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical Ocean General Circulation Models (OGCMs), accurate forecasting of global oceanic variations for multi-year remains to be a long-standing challenge. Here, we introduce ORCA (Oceanic Reliable foreCAst), the first data-driven model predicting global ocean circulation from multi-year to decadal time scales. ORCA accurately simulates the three-dimensional circulations and dynamics of the global ocean with high physical consistency. Hindcasts of key oceanic variables demonstrate ORCA's remarkable prediction skills in predicting ocean variations compared with state-of-the-art numerical OGCMs and abilities in capturing occurrences of extreme events at the subsurface ocean and ENSO vertical patterns. These results demonstrate the potential of data-driven ocean models for providing cheap, efficient, and accurate global ocean modeling and prediction. Moreover, ORCA stably and faithfully emulates ocean dynamics at decadal timescales, demonstrating its potential even for climate projections. The model will be available at https://github.com/OpenEarthLab/ORCA.
Abstract:Existing heterogeneous graph neural networks (HGNNs) have achieved great success in utilizing the rich semantic information in heterogeneous information networks (HINs). However, few works have delved into the utilization of long-range dependencies in HINs, which is extremely valuable as many real-world HINs are sparse, and each node has only a few directly connected neighbors. Although some HGNNs can utilize distant neighbors by stacking multiple layers or leveraging long meta-paths, the exponentially increased number of nodes in the receptive field or the number of meta-paths incurs high computation and memory costs. To address these issues, we investigate the importance of different meta-paths and propose Long-range Dependency based Multi-Layer Perceptron (LDMLP). Specifically, to solve the high-cost problem of leveraging long-range dependencies, LDMLP adopts a search stage to discover effective meta-paths automatically, reducing the exponentially increased number of meta-paths to a constant. To avoid the influence of specific modules on search results, LDMLP utilizes a simple architecture with only multi-layer perceptions in the search stage, improving the generalization of searched meta-paths. As a result, the searched meta-paths not only perform well in LDMLP but also enable other HGNNs like HAN and SeHGNN to perform better. Extensive experiments on eight heterogeneous datasets demonstrate that LDMLP achieves state-of-the-art performance while enjoying high efficiency and generalization, especially on sparse HINs.