Picture for Dawn Song

Dawn Song

University of California, Berkeley

MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

Add code
Jul 28, 2025
Viaarxiv icon

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Add code
Jul 10, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Viaarxiv icon

OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

Add code
Jun 23, 2025
Viaarxiv icon

AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents

Add code
Jun 17, 2025
Viaarxiv icon

Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

Add code
Jun 14, 2025
Viaarxiv icon

VERINA: Benchmarking Verifiable Code Generation

Add code
May 29, 2025
Viaarxiv icon

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models

Add code
May 28, 2025
Viaarxiv icon

Learning to Reason without External Rewards

Add code
May 26, 2025
Viaarxiv icon

A Critical Evaluation of Defenses against Prompt Injection Attacks

Add code
May 23, 2025
Viaarxiv icon