Reinforcement Learning


$f$-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment

Add code
Feb 05, 2026
Viaarxiv icon

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models

Add code
Feb 05, 2026
Viaarxiv icon

Quantum Reinforcement Learning with Transformers for the Capacitated Vehicle Routing Problem

Add code
Feb 05, 2026
Viaarxiv icon

On Computation and Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Add code
Feb 05, 2026
Viaarxiv icon

LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards

Add code
Feb 05, 2026
Viaarxiv icon

Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning

Add code
Feb 05, 2026
Viaarxiv icon

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

Add code
Feb 05, 2026
Viaarxiv icon

Residual Reinforcement Learning for Waste-Container Lifting Using Large-Scale Cranes with Underactuated Tools

Add code
Feb 05, 2026
Viaarxiv icon