Picture for Ruben Härle

Ruben Härle

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Add code
Apr 16, 2026
Viaarxiv icon

Measuring and Guiding Monosemanticity

Add code
Jun 24, 2025
Viaarxiv icon

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

Add code
Nov 11, 2024
Viaarxiv icon