Picture for Ziqian Zhong

Ziqian Zhong

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Add code
Jun 08, 2026
Viaarxiv icon

Base Models Look Human To AI Detectors

Add code
May 19, 2026
Viaarxiv icon

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

Add code
May 12, 2026
Viaarxiv icon

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Add code
Apr 19, 2026
Viaarxiv icon

Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

Add code
Apr 13, 2026
Viaarxiv icon

Hodoscope: Unsupervised Monitoring for AI Misbehaviors

Add code
Apr 13, 2026
Viaarxiv icon

From Horizontal Layering to Vertical Integration: A Comparative Study of the AI-Driven Software Development Paradigm

Add code
Jan 30, 2026
Viaarxiv icon

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

Add code
Jul 31, 2025
Viaarxiv icon

Algorithmic Capabilities of Random Transformers

Add code
Oct 06, 2024
Viaarxiv icon

Grokking as Compression: A Nonlinear Complexity Perspective

Add code
Oct 09, 2023
Viaarxiv icon