Picture for Chaozheng Wang

Chaozheng Wang

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Add code
May 14, 2026
Viaarxiv icon

SAGE: A Service Agent Graph-guided Evaluation Benchmark

Add code
Apr 10, 2026
Viaarxiv icon

WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making

Add code
Mar 22, 2026
Viaarxiv icon

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

Add code
Feb 26, 2026
Viaarxiv icon

SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue

Add code
Feb 03, 2026
Viaarxiv icon

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration

Add code
Oct 02, 2025
Figure 1 for REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration
Figure 2 for REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration
Figure 3 for REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration
Figure 4 for REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration
Viaarxiv icon

UMoE: Unifying Attention and FFN with Shared Experts

Add code
May 12, 2025
Figure 1 for UMoE: Unifying Attention and FFN with Shared Experts
Figure 2 for UMoE: Unifying Attention and FFN with Shared Experts
Figure 3 for UMoE: Unifying Attention and FFN with Shared Experts
Figure 4 for UMoE: Unifying Attention and FFN with Shared Experts
Viaarxiv icon

CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations

Add code
Apr 19, 2025
Figure 1 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 2 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 3 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Figure 4 for CODECRASH: Stress Testing LLM Reasoning under Structural and Semantic Perturbations
Viaarxiv icon

IDInit: A Universal and Stable Initialization Method for Neural Network Training

Add code
Mar 06, 2025
Viaarxiv icon

How Should I Build A Benchmark?

Add code
Jan 18, 2025
Viaarxiv icon