Picture for Xiaorong Chen

Xiaorong Chen

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

Add code
Jun 02, 2026
Viaarxiv icon