Picture for Xing Yu

Xing Yu

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Add code
Mar 11, 2026
Viaarxiv icon

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Add code
Mar 04, 2026
Viaarxiv icon

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Add code
Feb 15, 2026
Viaarxiv icon

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Add code
Feb 11, 2026
Viaarxiv icon

DeepEyesV2: Toward Agentic Multimodal Model

Add code
Nov 10, 2025
Viaarxiv icon

Towards Agentic Self-Learning LLMs in Search Environment

Add code
Oct 16, 2025
Figure 1 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 2 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 3 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 4 for Towards Agentic Self-Learning LLMs in Search Environment
Viaarxiv icon

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

Add code
May 20, 2025
Viaarxiv icon

Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

Add code
Apr 23, 2025
Figure 1 for Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
Figure 2 for Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
Figure 3 for Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
Figure 4 for Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
Viaarxiv icon

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Add code
Apr 04, 2025
Figure 1 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 2 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 3 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Figure 4 for Think When You Need: Self-Adaptive Chain-of-Thought Learning
Viaarxiv icon

Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model

Add code
Mar 28, 2025
Figure 1 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 2 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 3 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Figure 4 for Probabilistic Uncertain Reward Model: A Natural Generalization of Bradley-Terry Reward Model
Viaarxiv icon