Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minrui Xu

Sherman

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

May 18, 2026

Minrui Xu, Zilin Wang, Mengyi DENG, Zhiwei Li, Zhicheng Yang, Xiao Zhu, Yinhong Liu, Boyu Zhu, Baiyu Huang, Chao Chen(+5 more)

Abstract:Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including $τ^2$-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.

* 11 pages

Via

Access Paper or Ask Questions

R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

May 21, 2025

Yuante Li, Xu Yang, Xiao Yang, Minrui Xu, Xisen Wang, Weiqing Liu, Jiang Bian

Figure 1 for R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

Figure 2 for R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

Figure 3 for R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

Figure 4 for R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization

Abstract:Financial markets pose fundamental challenges for asset return prediction due to their high dimensionality, non-stationarity, and persistent volatility. Despite advances in large language models and multi-agent systems, current quantitative research pipelines suffer from limited automation, weak interpretability, and fragmented coordination across key components such as factor mining and model innovation. In this paper, we propose R&D-Agent for Quantitative Finance, in short RD-Agent(Q), the first data-centric multi-agent framework designed to automate the full-stack research and development of quantitative strategies via coordinated factor-model co-optimization. RD-Agent(Q) decomposes the quant process into two iterative stages: a Research stage that dynamically sets goal-aligned prompts, formulates hypotheses based on domain priors, and maps them to concrete tasks, and a Development stage that employs a code-generation agent, Co-STEER, to implement task-specific code, which is then executed in real-market backtests. The two stages are connected through a feedback stage that thoroughly evaluates experimental outcomes and informs subsequent iterations, with a multi-armed bandit scheduler for adaptive direction selection. Empirically, RD-Agent(Q) achieves up to 2X higher annualized returns than classical factor libraries using 70% fewer factors, and outperforms state-of-the-art deep time-series models on real markets. Its joint factor-model optimization delivers a strong balance between predictive accuracy and strategy robustness. Our code is available at: https://github.com/microsoft/RD-Agent.

Via

Access Paper or Ask Questions

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

May 20, 2025

Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu(+3 more)

Abstract:Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. While crowdsourcing platforms alleviate some challenges, high-level data science tasks remain labor-intensive and iterative. To overcome these limitations, we introduce R&D-Agent, a dual-agent framework for iterative exploration. The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback. By enabling multiple parallel exploration traces that merge and enhance one another, R&D-Agent narrows the gap between automated solutions and expert-level performance. Evaluated on MLE-Bench, R&D-Agent emerges as the top-performing machine learning engineering agent, demonstrating its potential to accelerate innovation and improve precision across diverse data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.

* 7 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions

Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications

May 07, 2025

Yuanai Xie, Zhaozhi Liu, Xiao Zhang, Shihua Zhang, Rui Hou, Minrui Xu, Ruichen Zhang, Dusit Niyato

Abstract:Covert Communications (CC) can secure sensitive transmissions in industrial, military, and mission-critical applications within 6G wireless networks. However, traditional optimization methods based on Artificial Noise (AN), power control, and channel manipulation might not adapt to dynamic and adversarial environments due to the high dimensionality, nonlinearity, and stringent real-time covertness requirements. To bridge this gap, we introduce Shadow Wireless Intelligence (SWI), which integrates the reasoning capabilities of Large Language Models (LLMs) with retrieval-augmented generation to enable intelligent decision-making in covert wireless systems. Specifically, we utilize DeepSeek-R1, a mixture-of-experts-based LLM with RL-enhanced reasoning, combined with real-time retrieval of domain-specific knowledge to improve context accuracy and mitigate hallucinations. Our approach develops a structured CC knowledge base, supports context-aware retrieval, and performs semantic optimization, allowing LLMs to generate and adapt CC strategies in real time. In a case study on optimizing AN power in a full-duplex CC scenario, DeepSeek-R1 achieves 85% symbolic derivation accuracy and 94% correctness in the generation of simulation code, outperforming baseline models. These results validate SWI as a robust, interpretable, and adaptive foundation for LLM-driven intelligent covert wireless systems in 6G networks.

Via

Access Paper or Ask Questions

Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

Nov 11, 2024

Yang Zhao, Yue Xiu, Minrui Xu, Ping Wang, Ning Wei

Figure 1 for Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

Figure 2 for Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

Figure 3 for Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

Figure 4 for Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

Abstract:Federated learning (FL) in wireless computing effectively utilizes communication bandwidth, yet it is vulnerable to errors during the analog aggregation process. While removing users with unfavorable channel conditions can mitigate these errors, it also reduces the available local training data for FL, which in turn hinders the convergence rate of the training process. To tackle this issue, we propose the use of movable antenna (MA) techniques to enhance the degrees of freedom within the channel space, ultimately boosting the convergence speed of FL training. Moreover, we develop a coordinated approach for uplink receiver beamforming, user selection, and MA positioning to optimize the convergence rate of wireless FL training in dynamic wireless environments. This stochastic optimization challenge is reformulated into a mixed-integer programming problem by utilizing the training loss upper bound. We then introduce a penalty dual decomposition (PDD) method to solve the mixed-integer mixed programming problem. Experimental results indicate that incorporating MA techniques significantly accelerates the training convergence of FL and greatly surpasses conventional methods.

Via

Access Paper or Ask Questions

Diffusion-based Auction Mechanism for Efficient Resource Management in 6G-enabled Vehicular Metaverses

Nov 01, 2024

Jiawen Kang, Yongju Tong, Yue Zhong, Junlong Chen, Minrui Xu, Dusit Niyato, Runrong Deng, Shiwen Mao

Abstract:The rise of 6G-enable Vehicular Metaverses is transforming the automotive industry by integrating immersive, real-time vehicular services through ultra-low latency and high bandwidth connectivity. In 6G-enable Vehicular Metaverses, vehicles are represented by Vehicle Twins (VTs), which serve as digital replicas of physical vehicles to support real-time vehicular applications such as large Artificial Intelligence (AI) model-based Augmented Reality (AR) navigation, called VT tasks. VT tasks are resource-intensive and need to be offloaded to ground Base Stations (BSs) for fast processing. However, high demand for VT tasks and limited resources of ground BSs, pose significant resource allocation challenges, particularly in densely populated urban areas like intersections. As a promising solution, Unmanned Aerial Vehicles (UAVs) act as aerial edge servers to dynamically assist ground BSs in handling VT tasks, relieving resource pressure on ground BSs. However, due to high mobility of UAVs, there exists information asymmetry regarding VT task demands between UAVs and ground BSs, resulting in inefficient resource allocation of UAVs. To address these challenges, we propose a learning-based Modified Second-Bid (MSB) auction mechanism to optimize resource allocation between ground BSs and UAVs by accounting for VT task latency and accuracy. Moreover, we design a diffusion-based reinforcement learning algorithm to optimize the price scaling factor, maximizing the total surplus of resource providers and minimizing VT task latency. Finally, simulation results demonstrate that the proposed diffusion-based MSB auction outperforms traditional baselines, providing better resource distribution and enhanced service quality for vehicular users.

Via

Access Paper or Ask Questions

Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

Sep 22, 2024

Yue Xiu, Yang Zhao, Songjie Yang, Minrui Xu, Dusit Niyato, Yueyang Li, Ning Wei

Figure 1 for Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

Figure 2 for Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

Figure 3 for Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

Figure 4 for Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

Abstract:In future 6G networks, anti-jamming will become a critical challenge, particularly with the development of intelligent jammers that can initiate malicious interference, posing a significant security threat to communication transmission. Additionally, 6G networks have introduced mobile edge computing (MEC) technology to reduce system delay for edge user equipment (UEs). Thus, one of the key challenges in wireless communications is minimizing the system delay while mitigating interference and improving the communication rate. However, the current fixed-position antenna (FPA) techniques have limited degrees of freedom (DoF) and high power consumption, making them inadequate for communication in highly interfering environments. To address these challenges, this paper proposes a novel MEC anti-jamming communication architecture supported by mobile antenna (MA) technology. The core of the MA technique lies in optimizing the position of the antennas to increase DoF. The increase in DoF enhances the system's anti-jamming capabilities and reduces system delay. In this study, our goal is to reduce system delay while ensuring communication security and computational requirements. We design the position of MAs for UEs and the base station (BS), optimize the transmit beamforming at the UEs and the receive beamforming at the BS, and adjust the offloading rates and resource allocation for computation tasks at the MEC server. Since the optimization problem is a non-convex multi-variable coupled problem, we propose an algorithm based on penalty dual decomposition (PDD) combined with successive convex approximation (SCA). The simulation results demonstrate that the proposed MA architecture and the corresponding schemes offer superior anti-jamming capabilities and reduce the system delay compared to FPA.

Via

Access Paper or Ask Questions

Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Aug 26, 2024

Yahao Ding, Wen Shang, Minrui Xu, Zhaohui Yang, Ye Hu, Dusit Niyato, Mohammad Shikh-Bahaei

Figure 1 for Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Figure 2 for Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Figure 3 for Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Figure 4 for Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

Abstract:The Metaverse, a burgeoning collective virtual space merging augmented reality and persistent virtual worlds, necessitates advanced artificial intelligence (AI) and communication technologies to support immersive and interactive experiences. Federated learning (FL) has emerged as a promising technique for collaboratively training AI models while preserving data privacy. However, FL faces challenges such as high communication overhead and substantial computational demands, particularly for neural network (NN) models. To address these issues, we propose an integrated federated split learning and hyperdimensional computing (FSL-HDC) framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it particularly suitable for resource-constrained edge devices in the Metaverse, ensuring real-time responsive interactions. Additionally, we introduce an optimization algorithm that concurrently optimizes transmission power and bandwidth to minimize the maximum transmission time among all users to the server. The simulation results based on the MNIST dataset indicate that FSL-HDC achieves an accuracy rate of approximately 87.5%, which is slightly lower than that of FL-HDC. However, FSL-HDC exhibits a significantly faster convergence speed, approximately 3.733x that of FSL-NN, and demonstrates robustness to non-IID data distributions. Moreover, our proposed optimization algorithm can reduce the maximum transmission time by up to 64% compared with the baseline.

Via

Access Paper or Ask Questions

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

Aug 22, 2024

KaShun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza

Abstract:Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the "concentrated knowledge" phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a "trustworthy maximization" process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.

Via

Access Paper or Ask Questions

DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

Aug 01, 2024

Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, Xiaojuan Ma

Figure 1 for DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

Figure 2 for DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

Figure 3 for DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

Figure 4 for DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

Abstract:Interdisciplinary studies often require researchers to explore literature in diverse branches of knowledge. Yet, navigating through the highly scattered knowledge from unfamiliar disciplines poses a significant challenge. In this paper, we introduce DiscipLink, a novel interactive system that facilitates collaboration between researchers and large language models (LLMs) in interdisciplinary information seeking (IIS). Based on users' topics of interest, DiscipLink initiates exploratory questions from the perspectives of possible relevant fields of study, and users can further tailor these questions. DiscipLink then supports users in searching and screening papers under selected questions by automatically expanding queries with disciplinary-specific terminologies, extracting themes from retrieved papers, and highlighting the connections between papers and questions. Our evaluation, comprising a within-subject comparative experiment and an open-ended exploratory study, reveals that DiscipLink can effectively support researchers in breaking down disciplinary boundaries and integrating scattered knowledge in diverse fields. The findings underscore the potential of LLM-powered tools in fostering information-seeking practices and bolstering interdisciplinary research.

Via

Access Paper or Ask Questions