Picture for Xuezhi Cao

Xuezhi Cao

Alphabetical order by last name

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

Add code
Jun 24, 2026
Viaarxiv icon

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

Add code
Jun 11, 2026
Viaarxiv icon

Asuka-Bench: Benchmarking Code Agents on Underspecified User Intent and Multi-Round Refinement

Add code
Jun 04, 2026
Viaarxiv icon

SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems

Add code
Jun 02, 2026
Viaarxiv icon

ATLAS: All-round Testing of Long-context Abilities across Scales

Add code
May 27, 2026
Viaarxiv icon

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Add code
May 25, 2026
Viaarxiv icon

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

Add code
Apr 13, 2026
Viaarxiv icon

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Add code
Apr 13, 2026
Viaarxiv icon

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

Add code
Apr 01, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon