Picture for Dawn Song

Dawn Song

University of California, Berkeley

VERINA: Benchmarking Verifiable Code Generation

Add code
May 29, 2025
Viaarxiv icon

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models

Add code
May 28, 2025
Viaarxiv icon

Learning to Reason without External Rewards

Add code
May 26, 2025
Figure 1 for Learning to Reason without External Rewards
Figure 2 for Learning to Reason without External Rewards
Figure 3 for Learning to Reason without External Rewards
Figure 4 for Learning to Reason without External Rewards
Viaarxiv icon

A Critical Evaluation of Defenses against Prompt Injection Attacks

Add code
May 23, 2025
Viaarxiv icon

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Add code
May 22, 2025
Figure 1 for SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Figure 2 for SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Figure 3 for SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Figure 4 for SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Viaarxiv icon

In-Context Watermarks for Large Language Models

Add code
May 22, 2025
Figure 1 for In-Context Watermarks for Large Language Models
Figure 2 for In-Context Watermarks for Large Language Models
Figure 3 for In-Context Watermarks for Large Language Models
Figure 4 for In-Context Watermarks for Large Language Models
Viaarxiv icon

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Add code
May 21, 2025
Viaarxiv icon

Probing the Vulnerability of Large Language Models to Polysemantic Interventions

Add code
May 16, 2025
Figure 1 for Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Figure 2 for Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Figure 3 for Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Figure 4 for Probing the Vulnerability of Large Language Models to Polysemantic Interventions
Viaarxiv icon

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Add code
May 09, 2025
Figure 1 for AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Figure 2 for AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Figure 3 for AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Figure 4 for AgentXploit: End-to-End Redteaming of Black-Box AI Agents
Viaarxiv icon

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

Add code
Apr 17, 2025
Viaarxiv icon