Picture for Huizhong Song

Huizhong Song

Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment

Add code
Dec 30, 2025
Viaarxiv icon

Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning

Add code
Sep 26, 2025
Viaarxiv icon

Risk-aware Direct Preference Optimization under Nested Risk Measure

Add code
May 29, 2025
Viaarxiv icon