Picture for Zhexin Zhang

Zhexin Zhang

From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents

Add code
Jun 04, 2026
Viaarxiv icon

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

Add code
Jun 02, 2026
Viaarxiv icon

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Add code
Apr 13, 2026
Viaarxiv icon

The Missing Half: Unveiling Training-time Implicit Safety Risks Beyond Deployment

Add code
Feb 04, 2026
Viaarxiv icon

MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization

Add code
Jan 05, 2026
Viaarxiv icon

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

Add code
Aug 07, 2025
Viaarxiv icon

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Add code
May 21, 2025
Viaarxiv icon

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

Add code
May 21, 2025
Viaarxiv icon

ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs

Add code
May 20, 2025
Figure 1 for ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs
Figure 2 for ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs
Figure 3 for ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs
Figure 4 for ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs
Viaarxiv icon

LongSafety: Evaluating Long-Context Safety of Large Language Models

Add code
Feb 24, 2025
Figure 1 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 2 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 3 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Figure 4 for LongSafety: Evaluating Long-Context Safety of Large Language Models
Viaarxiv icon