Picture for Zepeng Lin

Zepeng Lin

Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning

Add code
May 22, 2025
Viaarxiv icon