Picture for Zhenhong Zhou

Zhenhong Zhou

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Add code
Mar 24, 2026
Viaarxiv icon

Resource Consumption Threats in Large Language Models

Add code
Mar 17, 2026
Viaarxiv icon

MCPShield: A Security Cognition Layer for Adaptive Trust Calibration in Model Context Protocol Agents

Add code
Feb 15, 2026
Viaarxiv icon

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Add code
Feb 10, 2026
Viaarxiv icon

RECUR: Resource Exhaustion Attack via Recursive-Entropy Guided Counterfactual Utilization and Reflection

Add code
Feb 09, 2026
Viaarxiv icon

From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents

Add code
Feb 04, 2026
Viaarxiv icon

SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models

Add code
Jan 12, 2026
Viaarxiv icon

HearSay Benchmark: Do Audio LLMs Leak What They Hear?

Add code
Jan 07, 2026
Viaarxiv icon

CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns

Add code
Jan 05, 2026
Viaarxiv icon

MemEvolve: Meta-Evolution of Agent Memory Systems

Add code
Dec 21, 2025
Viaarxiv icon