Picture for Prateek Mittal

Prateek Mittal

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Add code
Oct 09, 2024
Figure 1 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 2 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 3 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Figure 4 for Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Viaarxiv icon

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Add code
Jun 25, 2024
Figure 1 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 2 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 3 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Figure 4 for Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Figure 1 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 2 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 3 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 4 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Viaarxiv icon

Data Shapley in One Training Run

Add code
Jun 16, 2024
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Figure 1 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 2 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 3 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 4 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

Certifiably Robust RAG against Retrieval Corruption

Add code
May 24, 2024
Figure 1 for Certifiably Robust RAG against Retrieval Corruption
Figure 2 for Certifiably Robust RAG against Retrieval Corruption
Figure 3 for Certifiably Robust RAG against Retrieval Corruption
Figure 4 for Certifiably Robust RAG against Retrieval Corruption
Viaarxiv icon

Position Paper: Beyond Robustness Against Single Attack Types

Add code
May 02, 2024
Viaarxiv icon

Teach LLMs to Phish: Stealing Private Information from Language Models

Add code
Mar 01, 2024
Figure 1 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 2 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 3 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 4 for Teach LLMs to Phish: Stealing Private Information from Language Models
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon