Picture for Dawn Song

Dawn Song

University of California, Berkeley

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Add code
Jul 17, 2024
Figure 1 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 2 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 3 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 4 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Viaarxiv icon

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

Add code
Jul 05, 2024
Viaarxiv icon

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

Add code
Jun 25, 2024
Figure 1 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 2 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 3 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Figure 4 for AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Viaarxiv icon

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Add code
Jun 24, 2024
Figure 1 for BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Figure 2 for BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Figure 3 for BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Figure 4 for BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Viaarxiv icon

Data Shapley in One Training Run

Add code
Jun 16, 2024
Figure 1 for Data Shapley in One Training Run
Figure 2 for Data Shapley in One Training Run
Figure 3 for Data Shapley in One Training Run
Figure 4 for Data Shapley in One Training Run
Viaarxiv icon

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Add code
Jun 13, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking

Add code
Apr 03, 2024
Figure 1 for KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Figure 2 for KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Figure 3 for KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Figure 4 for KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Viaarxiv icon

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

Add code
Mar 19, 2024
Figure 1 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 2 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 3 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Figure 4 for RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Viaarxiv icon

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Add code
Mar 18, 2024
Figure 1 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 2 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 3 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 4 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Viaarxiv icon