Picture for Yuanting Zha

Yuanting Zha

ShanghaiTech University

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

Add code
Jan 04, 2026
Viaarxiv icon