Picture for Riccardo Conte

Riccardo Conte

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Add code
May 27, 2026
Viaarxiv icon