Picture for Yingru Li

Yingru Li

A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms

Add code
Dec 28, 2025
Viaarxiv icon

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Add code
Dec 28, 2025
Viaarxiv icon

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Add code
Dec 28, 2025
Viaarxiv icon

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Add code
Sep 11, 2025
Viaarxiv icon

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

Logit Dynamics in Softmax Policy Gradient Methods

Add code
Jun 15, 2025
Viaarxiv icon

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Add code
May 29, 2025
Viaarxiv icon

Divergence-Augmented Policy Optimization

Add code
Jan 25, 2025
Viaarxiv icon

Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

Add code
Jul 18, 2024
Viaarxiv icon

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

Add code
Mar 17, 2024
Viaarxiv icon