Picture for Geng Hong

Geng Hong

ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models

Add code
May 22, 2025
Viaarxiv icon

OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation

Add code
Apr 18, 2025
Viaarxiv icon