Picture for Dawn Song

Dawn Song

University of California, Berkeley

Scalable Best-of-N Selection for Large Language Models via Self-Certainty

Add code
Feb 25, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Figure 1 for The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Figure 2 for The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Figure 3 for The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Figure 4 for The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Figure 1 for International AI Safety Report
Figure 2 for International AI Safety Report
Figure 3 for International AI Safety Report
Figure 4 for International AI Safety Report
Viaarxiv icon

Can LLMs Design Good Questions Based on Context?

Add code
Jan 07, 2025
Viaarxiv icon

Formal Mathematical Reasoning: A New Frontier in AI

Add code
Dec 20, 2024
Figure 1 for Formal Mathematical Reasoning: A New Frontier in AI
Figure 2 for Formal Mathematical Reasoning: A New Frontier in AI
Figure 3 for Formal Mathematical Reasoning: A New Frontier in AI
Figure 4 for Formal Mathematical Reasoning: A New Frontier in AI
Viaarxiv icon

Capturing the Temporal Dependence of Training Data Influence

Add code
Dec 12, 2024
Figure 1 for Capturing the Temporal Dependence of Training Data Influence
Figure 2 for Capturing the Temporal Dependence of Training Data Influence
Figure 3 for Capturing the Temporal Dependence of Training Data Influence
Figure 4 for Capturing the Temporal Dependence of Training Data Influence
Viaarxiv icon

Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

Add code
Dec 09, 2024
Figure 1 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 2 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 3 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Figure 4 for Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
Viaarxiv icon

Data Free Backdoor Attacks

Add code
Dec 09, 2024
Figure 1 for Data Free Backdoor Attacks
Figure 2 for Data Free Backdoor Attacks
Figure 3 for Data Free Backdoor Attacks
Figure 4 for Data Free Backdoor Attacks
Viaarxiv icon

PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage

Add code
Dec 07, 2024
Figure 1 for PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage
Figure 2 for PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage
Figure 3 for PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage
Figure 4 for PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage
Viaarxiv icon

SoK: Watermarking for AI-Generated Content

Add code
Nov 27, 2024
Figure 1 for SoK: Watermarking for AI-Generated Content
Figure 2 for SoK: Watermarking for AI-Generated Content
Viaarxiv icon