Picture for Dawn Song

Dawn Song

University of California, Berkeley

VERINA: Benchmarking Verifiable Code Generation

Add code
May 29, 2025
Viaarxiv icon

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models

Add code
May 28, 2025
Viaarxiv icon

Learning to Reason without External Rewards

Add code
May 26, 2025
Viaarxiv icon

A Critical Evaluation of Defenses against Prompt Injection Attacks

Add code
May 23, 2025
Viaarxiv icon

In-Context Watermarks for Large Language Models

Add code
May 22, 2025
Viaarxiv icon

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Add code
May 22, 2025
Viaarxiv icon

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Add code
May 21, 2025
Viaarxiv icon

Probing the Vulnerability of Large Language Models to Polysemantic Interventions

Add code
May 16, 2025
Viaarxiv icon

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Add code
May 09, 2025
Viaarxiv icon

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Add code
Apr 17, 2025
Viaarxiv icon