Picture for Xuandong Zhao

Xuandong Zhao

OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models

Add code
May 28, 2025
Viaarxiv icon

Learning to Reason without External Rewards

Add code
May 26, 2025
Viaarxiv icon

Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services

Add code
May 24, 2025
Viaarxiv icon

In-Context Watermarks for Large Language Models

Add code
May 22, 2025
Viaarxiv icon

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Add code
May 22, 2025
Viaarxiv icon

AgentXploit: End-to-End Redteaming of Black-Box AI Agents

Add code
May 09, 2025
Viaarxiv icon

Assessing Judging Bias in Large Reasoning Models: An Empirical Study

Add code
Apr 14, 2025
Viaarxiv icon

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs

Add code
Apr 07, 2025
Viaarxiv icon

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

Improving LLM Safety Alignment with Dual-Objective Optimization

Add code
Mar 05, 2025
Viaarxiv icon