Rl


On Computation and Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

Quantum Reinforcement Learning with Transformers for the Capacitated Vehicle Routing Problem

Add code
Feb 05, 2026
Viaarxiv icon

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Add code
Feb 05, 2026
Viaarxiv icon

Reinforcement World Model Learning for LLM-based Agents

Add code
Feb 05, 2026
Viaarxiv icon

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

Add code
Feb 05, 2026
Viaarxiv icon

TKG-Thinker: Towards Dynamic Reasoning over Temporal Knowledge Graphs via Agentic Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

Cross-Domain Offline Policy Adaptation via Selective Transition Correction

Add code
Feb 05, 2026
Viaarxiv icon

RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism

Add code
Feb 05, 2026
Viaarxiv icon