Picture for Zeyu Qin

Zeyu Qin

Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models

Add code
Jun 08, 2025
Viaarxiv icon

Lifelong Safety Alignment for Language Models

Add code
May 26, 2025
Viaarxiv icon

Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?

Add code
May 24, 2025
Viaarxiv icon

Scaling Laws of Synthetic Data for Language Models

Add code
Mar 26, 2025
Viaarxiv icon

Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment

Add code
Feb 06, 2025
Viaarxiv icon

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Add code
Oct 13, 2024
Figure 1 for Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Figure 2 for Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Figure 3 for Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Figure 4 for Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
Viaarxiv icon

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Add code
Aug 29, 2024
Viaarxiv icon

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

Add code
Jul 31, 2024
Viaarxiv icon

Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping

Add code
Feb 22, 2024
Viaarxiv icon

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Add code
Oct 11, 2023
Viaarxiv icon