Picture for Hangoo Kang

Hangoo Kang

TRAP: Targeted Redirecting of Agentic Preferences

Add code
May 29, 2025
Figure 1 for TRAP: Targeted Redirecting of Agentic Preferences
Figure 2 for TRAP: Targeted Redirecting of Agentic Preferences
Figure 3 for TRAP: Targeted Redirecting of Agentic Preferences
Figure 4 for TRAP: Targeted Redirecting of Agentic Preferences
Viaarxiv icon

Learning a Pessimistic Reward Model in RLHF

Add code
May 26, 2025
Figure 1 for Learning a Pessimistic Reward Model in RLHF
Figure 2 for Learning a Pessimistic Reward Model in RLHF
Figure 3 for Learning a Pessimistic Reward Model in RLHF
Viaarxiv icon

Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment

Add code
Nov 05, 2024
Figure 1 for Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Figure 2 for Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Figure 3 for Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Figure 4 for Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Viaarxiv icon

Improving LLM Code Generation with Grammar Augmentation

Add code
Mar 03, 2024
Viaarxiv icon