Picture for Haoran Gu

Haoran Gu

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

Add code
May 12, 2025
Viaarxiv icon

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

Add code
Apr 23, 2025
Viaarxiv icon