Picture for Zhilin Wang

Zhilin Wang

$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

Add code
May 14, 2026
Viaarxiv icon

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Add code
May 13, 2026
Viaarxiv icon

Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning

Add code
May 07, 2026
Viaarxiv icon

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Add code
Apr 28, 2026
Viaarxiv icon

Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data

Add code
Apr 20, 2026
Viaarxiv icon

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Add code
Apr 14, 2026
Viaarxiv icon

New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR

Add code
Feb 09, 2026
Viaarxiv icon

Characterizing, Evaluating, and Optimizing Complex Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

Evaluating Parameter Efficient Methods for RLVR

Add code
Dec 30, 2025
Viaarxiv icon

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Figure 1 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 2 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 3 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 4 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Viaarxiv icon