Picture for Pankayaraj Pathmanathan

Pankayaraj Pathmanathan

RAGPart & RAGMask: Retrieval-Stage Defenses Against Corpus Poisoning in Retrieval-Augmented Generation

Add code
Dec 30, 2025
Viaarxiv icon

Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling

Add code
Jul 08, 2025
Viaarxiv icon

PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models

Add code
Mar 10, 2025
Figure 1 for PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Figure 2 for PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Figure 3 for PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Figure 4 for PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
Viaarxiv icon

AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment

Add code
Oct 15, 2024
Figure 1 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 2 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 3 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 4 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Viaarxiv icon

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Add code
Jul 24, 2024
Figure 1 for Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Figure 2 for Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Figure 3 for Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Figure 4 for Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Viaarxiv icon

Is poisoning a real threat to LLM alignment? Maybe more so than you think

Add code
Jun 17, 2024
Figure 1 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 2 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 3 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Figure 4 for Is poisoning a real threat to LLM alignment? Maybe more so than you think
Viaarxiv icon

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Add code
Dec 05, 2023
Viaarxiv icon