Picture for Zhilin Wang

Zhilin Wang

New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR

Add code
Feb 09, 2026
Viaarxiv icon

Characterizing, Evaluating, and Optimizing Complex Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

Evaluating Parameter Efficient Methods for RLVR

Add code
Dec 30, 2025
Viaarxiv icon

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Figure 1 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 2 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 3 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 4 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Viaarxiv icon

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Add code
Sep 18, 2025
Viaarxiv icon

Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

Add code
Sep 04, 2025
Figure 1 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 2 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 3 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 4 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Viaarxiv icon

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Add code
May 20, 2025
Viaarxiv icon

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Add code
May 16, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Figure 1 for Llama-Nemotron: Efficient Reasoning Models
Figure 2 for Llama-Nemotron: Efficient Reasoning Models
Figure 3 for Llama-Nemotron: Efficient Reasoning Models
Figure 4 for Llama-Nemotron: Efficient Reasoning Models
Viaarxiv icon

SEE: Continual Fine-tuning with Sequential Ensemble of Experts

Add code
Apr 09, 2025
Viaarxiv icon