Picture for Weixiang Zhao

Weixiang Zhao

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Add code
Jun 09, 2025
Viaarxiv icon

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint

Add code
Jun 08, 2025
Viaarxiv icon

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

MPO: Multilingual Safety Alignment via Reward Gap Optimization

Add code
May 22, 2025
Viaarxiv icon

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Add code
May 21, 2025
Viaarxiv icon

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners

Add code
May 21, 2025
Viaarxiv icon

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender

Add code
Apr 13, 2025
Viaarxiv icon

Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

Add code
Mar 07, 2025
Viaarxiv icon

Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs

Add code
Feb 28, 2025
Viaarxiv icon

Lens: Rethinking Multilingual Enhancement for Large Language Models

Add code
Oct 06, 2024
Figure 1 for Lens: Rethinking Multilingual Enhancement for Large Language Models
Figure 2 for Lens: Rethinking Multilingual Enhancement for Large Language Models
Figure 3 for Lens: Rethinking Multilingual Enhancement for Large Language Models
Figure 4 for Lens: Rethinking Multilingual Enhancement for Large Language Models
Viaarxiv icon