Picture for Junfeng Fang

Junfeng Fang

Principled Steering via Null-space Projection for Jailbreak Defense in Vision-Language Models

Add code
Mar 23, 2026
Viaarxiv icon

GuardAlign: Test-time Safety Alignment in Multimodal Large Language Models

Add code
Feb 27, 2026
Viaarxiv icon

Enhancing Multi-Modal LLMs Reasoning via Difficulty-Aware Group Normalization

Add code
Feb 26, 2026
Viaarxiv icon

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Add code
Feb 16, 2026
Viaarxiv icon

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Add code
Feb 12, 2026
Viaarxiv icon

Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration

Add code
Feb 11, 2026
Viaarxiv icon

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

Add code
Feb 11, 2026
Viaarxiv icon

Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control

Add code
Feb 07, 2026
Viaarxiv icon

The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment

Add code
Feb 04, 2026
Viaarxiv icon

From Data to Behavior: Predicting Unintended Model Behaviors Before Training

Add code
Feb 04, 2026
Viaarxiv icon