Picture for Victor Ma

Victor Ma

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization

Add code
Aug 12, 2025
Viaarxiv icon