Abstract:In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external code tools effectively, which is supported by tool-augmented reinforcement learning (RL) through interactive learning. Despite its benefits, tool-augmented RL can still suffer from potential instability in the learning dynamics. In light of this challenge, we present a systematic approach to improving the training effectiveness and stability of tool-augmented RL for code-integrated reasoning. Specifically, we develop enhanced training strategies that balance exploration and stability, progressively building tool-use capabilities while improving reasoning performance. Through extensive experiments on five mainstream mathematical reasoning benchmarks, our model demonstrates significant performance improvements over multiple competitive baselines. Furthermore, we conduct an in-depth analysis of the mechanism and effect of code-integrated reasoning, revealing several key insights, such as the extension of model's capability boundaries and the simultaneous improvement of reasoning efficiency through code integration. All data and code for reproducing this work are available at: https://github.com/RUCAIBox/CIR.
Abstract:Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for real-world deployment. This paper introduces SimpleDeepSearcher, a lightweight yet effective framework that bridges this gap through strategic data engineering rather than complex training paradigms. Our approach synthesizes high-quality training data by simulating realistic user interactions in live web search environments, coupled with a multi-criteria curation strategy that optimizes the diversity and quality of input and output side. Experiments on five benchmarks across diverse domains demonstrate that SFT on only 871 curated samples yields significant improvements over RL-based baselines. Our work establishes SFT as a viable pathway by systematically addressing the data-scarce bottleneck, offering practical insights for efficient deep search systems. Our code is available at https://github.com/RUCAIBox/SimpleDeepSearcher.
Abstract:Based on the three-dimensional propagation characteristic under built-up scenarios, a height-dependent line-of-sight (LoS) probability model for air-to-ground (A2G) millimeter wave (mmWave) communications is proposed in this paper. With comprehensive considerations of scenario factors, i.e., building height distribution, building width, building space, and the heights of transceivers, this paper upgrades the prediction method of International Telecommunication Union-Radio (ITU-R) standard to both low altitude and high altitude cases. In order to speed up the LoS probability prediction, an approximate parametric model is also developed based on the theoretical expression. The simulation results based on ray-tracing (RT) method show that the proposed model has good consistency with existing models at the low altitude. However, it has better performance at the high altitude. The new model can be used for the A2G channel modeling and performance analysis such as cell coverage, outage probability, and bit error rate of A2G communication systems.
Abstract:Line-of-sight (LoS) path is essential for the reliability of air-to-ground (A2G) communications, but the existence of LoS path is difficult to predict due to random obstacles on the ground. Based on the statistical geographic information and Fresnel clearance zone, a general stochastic LoS probability model for three-dimensional (3D) A2G channels under urban scenarios is developed. By considering the factors, i.e., building height distribution, building width, building space, carrier frequency, and transceiver's heights, the proposed model is suitable for different frequencies and altitudes. Moreover, in order to get a closed-form expression and reduce the computational complexity, an approximate parametric model is also built with the machine-learning (ML) method to estimate model parameters. The simulation results show that the proposed model has good consistency with existing models at the low altitude. When the altitude increases, it has better performance by comparing with that of the ray-tracing Monte-Carlo simulation data. The analytical results of proposed model are helpful for the channel modeling and performance analysis such as cell coverage, outage probability, and bit error rate in A2G communications.