Picture for Sunghwan Park

Sunghwan Park

Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures

Add code
May 28, 2026
Viaarxiv icon