Picture for Xunliang Cai

Xunliang Cai

Alphabetical order by last name

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Add code
Oct 30, 2025
Viaarxiv icon

Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing

Add code
Oct 30, 2025
Viaarxiv icon

CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions

Add code
Oct 30, 2025
Viaarxiv icon

A Survey on LLM Mid-training

Add code
Oct 27, 2025
Viaarxiv icon

Autoformalizer with Tool Feedback

Add code
Oct 08, 2025
Viaarxiv icon

Making Mathematical Reasoning Adaptive

Add code
Oct 06, 2025
Viaarxiv icon

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Add code
Sep 30, 2025
Viaarxiv icon

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models

Add code
Sep 18, 2025
Viaarxiv icon

Instance-level Randomization: Toward More Stable LLM Evaluations

Add code
Sep 16, 2025
Viaarxiv icon

OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation

Add code
Sep 03, 2025
Viaarxiv icon