Picture for Yichang Zhang

Yichang Zhang

additional authors not shown

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

Add code
May 25, 2026
Viaarxiv icon

SAGE: Scalable Automated Robustness Augmentation for LLM Knowledge Evaluation

Add code
May 12, 2026
Viaarxiv icon

Qwen-Image-2.0 Technical Report

Add code
May 11, 2026
Viaarxiv icon

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Add code
Feb 04, 2026
Viaarxiv icon

Qwen3Guard Technical Report

Add code
Oct 16, 2025
Viaarxiv icon

RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing

Add code
Jul 27, 2025
Viaarxiv icon

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation

Add code
May 26, 2025
Viaarxiv icon

Qwen3 Technical Report

Add code
May 14, 2025
Figure 1 for Qwen3 Technical Report
Figure 2 for Qwen3 Technical Report
Figure 3 for Qwen3 Technical Report
Figure 4 for Qwen3 Technical Report
Viaarxiv icon

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Add code
Apr 30, 2025
Viaarxiv icon

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon