Picture for Haoran Dang

Haoran Dang

Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning

Add code
Feb 12, 2026
Viaarxiv icon