reinforcement learning


Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts

Add code
Oct 27, 2025
Viaarxiv icon

Never Too Rigid to Reach: Adaptive Virtual Model Control with LLM- and Lyapunov-Based Reinforcement Learning

Add code
Oct 27, 2025
Viaarxiv icon

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Add code
Oct 27, 2025
Viaarxiv icon

AirFed: Federated Graph-Enhanced Multi-Agent Reinforcement Learning for Multi-UAV Cooperative Mobile Edge Computing

Add code
Oct 27, 2025
Viaarxiv icon

Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms

Add code
Oct 27, 2025
Viaarxiv icon

RL-AUX: Reinforcement Learning for Auxiliary Task Generation

Add code
Oct 27, 2025
Viaarxiv icon

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Add code
Oct 27, 2025
Viaarxiv icon

Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner

Add code
Oct 27, 2025
Viaarxiv icon

Offline Preference Optimization via Maximum Marginal Likelihood Estimation

Add code
Oct 27, 2025
Viaarxiv icon

PASS-Enhanced MEC: Joint Optimization of Task Offloading and Uplink PASS Beamforming

Add code
Oct 27, 2025
Viaarxiv icon