Abstract:Earth Observation (EO) is essential for perceiving dynamic land surface changes, yet deploying autonomous EO in open environments is hindered by the immense diversity of multi-source data and heterogeneous tasks. While remote sensing agents have emerged to streamline EO workflows, existing tool-calling agents are confined to closed environments. They rely on pre-defined tools and are restricted to narrow scope, limiting their generalization to the diverse data and tasks. To overcome these limitations, we introduce OpenEarth-Agent, the first tool-creation agent framework tailored for open-environment EO. Rather than calling predefined tools, OpenEarth-Agent employs adaptive workflow planning and tool creation to generalize to unseen data and tasks. This adaptability is bolstered by an open-ended integration of multi-stage tools and cross-domain knowledge bases, enabling robust execution in the entire EO pipeline across multiple application domains. To comprehensively evaluate EO agents in open environments, we propose OpenEarth-Bench, a novel benchmark comprising 596 real-world, full-pipeline cases across seven application domains, explicitly designed to assess agents' adaptive planning and tool creation capabilities. Only essential pre-trained model tools are provided in this benchmark, devoid of any other predefined task-specific tools. Extensive experiments demonstrate that OpenEarth-Agent successfully masters full-pipeline EO across multiple domains in the open environment. Notably, on the cross-benchmark Earth-Bench, our tool-creating agent equipped with 6 essential pre-trained models achieves performance comparable to tool-calling agents relying on 104 specialized tools, and significantly outperforms them when provided with the complete toolset. In several cases, the created tools exhibit superior robustness to data anomalies compared to human-engineered counterparts.
Abstract:Inferring the physical mechanisms that govern earthquake sequences from indirect geophysical observations remains difficult, particularly across tectonically distinct environments where similar seismic patterns can reflect different underlying processes. Current interpretations rely heavily on the expert synthesis of catalogs, spatiotemporal statistics, and candidate physical models, limiting reproducibility and the systematic transfer of insight across settings. Here we present TRACE (Trans-perspective Reasoning and Automated Comprehensive Evaluator), a multi-agent system that combines large language model planning with formal seismological constraints to derive auditable, physically grounded mechanistic inference from raw observations. Applied to the 2019 Ridgecrest sequence, TRACE autonomously identifies stress-perturbation-induced delayed triggering, resolving the cascading interaction between the Mw 6.4 and Mw 7.1 mainshocks; in the Santorini-Kolumbo case, the system identifies a structurally guided intrusion model, distinguishing fault-channeled episodic migration from the continuous propagation expected in homogeneous crustal failure. By providing a generalizable logical infrastructure for interpreting heterogeneous seismic phenomena, TRACE advances the field from expert-dependent analysis toward knowledge-guided autonomous discovery in Earth sciences.
Abstract:Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep research, idea generation, dry/wet experiments, and experimental reasoning. SGI-Bench comprises over 1,000 expert-curated, cross-disciplinary samples inspired by Science's 125 Big Questions, enabling systematic evaluation of state-of-the-art LLMs. Results reveal gaps: low exact match (10--20%) in deep research despite step-level alignment; ideas lacking feasibility and detail; high code executability but low execution result accuracy in dry experiments; low sequence fidelity in wet protocols; and persistent multimodal comparative-reasoning challenges. We further introduce Test-Time Reinforcement Learning (TTRL), which optimizes retrieval-augmented novelty rewards at inference, enhancing hypothesis novelty without reference answer. Together, our PIM-grounded definition, workflow-centric benchmark, and empirical insights establish a foundation for AI systems that genuinely participate in scientific discovery.
Abstract:As an effective approach to quantify how training samples influence test sample, data attribution is crucial for understanding data and model and further enhance the transparency of machine learning models. We find that prevailing data attribution methods based on leave-one-out (LOO) strategy suffer from the local-based explanation, as these LOO-based methods only perturb a single training sample, and overlook the collective influence in the training set. On the other hand, the lack of baseline in many data attribution methods reduces the flexibility of the explanation, e.g., failing to provide counterfactual explanations. In this paper, we propose Integrated Influence, a novel data attribution method that incorporates a baseline approach. Our method defines a baseline dataset, follows a data degeneration process to transition the current dataset to the baseline, and accumulates the influence of each sample throughout this process. We provide a solid theoretical framework for our method, and further demonstrate that popular methods, such as influence functions, can be viewed as special cases of our approach. Experimental results show that Integrated Influence generates more reliable data attributions compared to existing methods in both data attribution task and mislablled example identification task.



Abstract:Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.




Abstract:Accurate and reliable link quality prediction (LQP) is crucial for optimizing network performance, ensuring communication stability, and enhancing user experience in wireless communications. However, LQP faces significant challenges due to the dynamic and lossy nature of wireless links, which are influenced by interference, multipath effects, fading, and blockage. In this paper, we propose GAT-LLM, a novel multivariate wireless link quality prediction model that combines Large Language Models (LLMs) with Graph Attention Networks (GAT) to enable accurate and reliable multivariate LQP of wireless communications. By framing LQP as a time series prediction task and appropriately preprocessing the input data, we leverage LLMs to improve the accuracy of link quality prediction. To address the limitations of LLMs in multivariate prediction due to typically handling one-dimensional data, we integrate GAT to model interdependencies among multiple variables across different protocol layers, enhancing the model's ability to handle complex dependencies. Experimental results demonstrate that GAT-LLM significantly improves the accuracy and robustness of link quality prediction, particularly in multi-step prediction scenarios.
Abstract:The application of deep learning (DL)-based channel state information (CSI) feedback frameworks in massive multiple-input multiple-output (MIMO) systems has significantly improved reconstruction accuracy. However, the limited generalization of widely adopted autoencoder-based networks for CSI feedback challenges consistent performance under dynamic wireless channel conditions and varying communication overhead constraints. To enhance the robustness of DL-based CSI feedback across diverse channel scenarios, we propose a novel framework, ITUG, where the user equipment (UE) transmits only a selected portion of critical values in the CSI matrix, while a generative model deployed at the BS reconstructs the remaining values. Specifically, we introduce a scoring algorithm to identify important values based on amplitude and contrast, an encoding algorithm to convert these values into a bit stream for transmission using adaptive bit length and a modified Huffman codebook, and a Transformer-based generative network named TPMVNet to recover the untransmitted values based on the received important values. Experimental results demonstrate that the ITUG framework, equipped with a single TPMVNet, achieves superior reconstruction performance compared to several high-performance autoencoder models across various channel conditions.




Abstract:Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and the Passenger EEG Decoding Strategy (PEDS). Central to PEDS is a novel Convolutional Recurrent Neural Network (CRNN) that captures spatial and temporal EEG data patterns. The CRNN, combined with stacking algorithms, achieves an accuracy of $85.0\% \pm 3.18\%$. Our findings highlight the predictive power of pre-event EEG data, enhancing the detection of hazardous scenarios and offering a network-driven framework for safer autonomous vehicles.




Abstract:The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called Dig-CSI, in which the dataset for training the CSI feedback model is produced by the distributed generators uploaded by each user equipment (UE), but not through local data upload. Each UE trains an autoencoder, where the decoder is considered as the distributed generator, with local data to gain reconstruction accuracy and the ability to generate. Experimental results show that Dig-CSI can train a global CSI feedback model with comparable performance to the model trained with classical centralized learning with a much lighter communication overhead.




Abstract:Deep learning has been widely applied for the channel state information (CSI) feedback in frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) system. For the typical supervised training of the feedback model, the requirements of large amounts of task-specific labeled data can hardly be satisfied, and the huge training costs and storage usage of the model in multiple scenarios are hindrance for model application. In this letter, a multi-task learning-based approach is proposed to improve the feasibility of the feedback network. An encoder-shared feedback architecture and the corresponding training scheme are further proposed to facilitate the implementation of the multi-task learning approach. The experimental results indicate that the proposed multi-task learning approach can achieve comprehensive feedback performance with considerable reduction of training cost and storage usage of the feedback model.