Picture for Prateek Mittal

Prateek Mittal

Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

Add code
Jun 25, 2024
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Viaarxiv icon

Data Shapley in One Training Run

Add code
Jun 16, 2024
Figure 1 for Data Shapley in One Training Run
Figure 2 for Data Shapley in One Training Run
Figure 3 for Data Shapley in One Training Run
Figure 4 for Data Shapley in One Training Run
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

Certifiably Robust RAG against Retrieval Corruption

Add code
May 24, 2024
Viaarxiv icon

Position Paper: Beyond Robustness Against Single Attack Types

Add code
May 02, 2024
Viaarxiv icon

Teach LLMs to Phish: Stealing Private Information from Language Models

Add code
Mar 01, 2024
Figure 1 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 2 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 3 for Teach LLMs to Phish: Stealing Private Information from Language Models
Figure 4 for Teach LLMs to Phish: Stealing Private Information from Language Models
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Add code
Jan 20, 2024
Viaarxiv icon