Picture for Weixiang Zhao

Weixiang Zhao

ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments

Add code
Mar 09, 2026
Viaarxiv icon

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Add code
Feb 12, 2026
Viaarxiv icon

Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons

Add code
Feb 01, 2026
Viaarxiv icon

Large Language Model Agents Are Not Always Faithful Self-Evolvers

Add code
Jan 30, 2026
Viaarxiv icon

TEA-Bench: A Systematic Benchmarking of Tool-enhanced Emotional Support Dialogue Agent

Add code
Jan 26, 2026
Viaarxiv icon

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Add code
Jan 25, 2026
Viaarxiv icon

Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering

Add code
Jan 20, 2026
Viaarxiv icon

OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents

Add code
Jan 20, 2026
Viaarxiv icon

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement

Add code
Jun 18, 2025
Viaarxiv icon

On Reasoning Strength Planning in Large Reasoning Models

Add code
Jun 10, 2025
Viaarxiv icon