Picture for Cunxiang Wang

Cunxiang Wang

HoWToBench: Holistic Evaluation for LLM's Capability in Human-level Writing using Tree of Writing

Add code
Apr 21, 2026
Viaarxiv icon

IndustryCode: A Benchmark for Industry Code Generation

Add code
Apr 03, 2026
Viaarxiv icon

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Add code
Mar 05, 2026
Viaarxiv icon

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis

Add code
Feb 28, 2026
Viaarxiv icon

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models

Add code
Feb 28, 2026
Viaarxiv icon

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Add code
Feb 28, 2026
Viaarxiv icon

GLM-5: from Vibe Coding to Agentic Engineering

Add code
Feb 17, 2026
Viaarxiv icon

MVSS: A Unified Framework for Multi-View Structured Survey Generation

Add code
Jan 14, 2026
Viaarxiv icon

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation

Add code
Jan 12, 2026
Viaarxiv icon

DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation

Add code
Jan 08, 2026
Viaarxiv icon