Picture for Daniil Ognev

Daniil Ognev

Robust Safety Monitoring of Language Models via Activation Watermarking

Add code
Mar 24, 2026
Viaarxiv icon