Picture for Wangchunshu Zhou

Wangchunshu Zhou

PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization

Add code
Jun 15, 2025
Viaarxiv icon

Scaling Test-time Compute for LLM Agents

Add code
Jun 15, 2025
Viaarxiv icon

TaskCraft: Automated Generation of Agentic Tasks

Add code
Jun 11, 2025
Viaarxiv icon

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

Add code
May 29, 2025
Viaarxiv icon

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Add code
May 21, 2025
Viaarxiv icon

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Add code
Apr 07, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Add code
Jan 11, 2025
Viaarxiv icon

AI PERSONA: Towards Life-long Personalization of LLMs

Add code
Dec 17, 2024
Viaarxiv icon