Picture for Daniil Ognev

Daniil Ognev

Value-Gradient Hypothesis of RL for LLMs

Add code
May 20, 2026
Viaarxiv icon

Robust Safety Monitoring of Language Models via Activation Watermarking

Add code
Mar 24, 2026
Viaarxiv icon