Picture for Haoran Gu

Haoran Gu

Demystifying Design Choices of Reinforcement Fine-tuning: A Batched Contextual Bandit Learning Perspective

Add code
Jan 30, 2026
Viaarxiv icon

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

Add code
Jan 01, 2026
Viaarxiv icon

One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models

Add code
May 12, 2025
Figure 1 for One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Figure 2 for One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Figure 3 for One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Figure 4 for One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models
Viaarxiv icon

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

Add code
Apr 23, 2025
Figure 1 for ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Figure 2 for ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Figure 3 for ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Figure 4 for ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Viaarxiv icon