Picture for Terry Yue Zhuo

Terry Yue Zhuo

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Add code
May 14, 2026
Viaarxiv icon

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

Add code
Apr 26, 2026
Viaarxiv icon

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Add code
Mar 25, 2026
Viaarxiv icon

SecCodeBench-V2 Technical Report

Add code
Feb 17, 2026
Viaarxiv icon

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model

Add code
Feb 07, 2026
Viaarxiv icon

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Add code
Feb 01, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

An Empirical Study of Vulnerabilities in Python Packages and Their Detection

Add code
Sep 04, 2025
Viaarxiv icon

EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Add code
May 19, 2025
Figure 1 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 2 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 3 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Figure 4 for EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Viaarxiv icon

Less is More: Towards Green Code Large Language Models via Unified Structural Pruning

Add code
Dec 20, 2024
Viaarxiv icon