Picture for Zhilin Wang

Zhilin Wang

Evaluating Parameter Efficient Methods for RLVR

Add code
Dec 30, 2025
Viaarxiv icon

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Figure 1 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 2 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 3 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 4 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Viaarxiv icon

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Add code
Sep 18, 2025
Viaarxiv icon

Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

Add code
Sep 04, 2025
Figure 1 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 2 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 3 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Figure 4 for Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Viaarxiv icon

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Add code
May 20, 2025
Viaarxiv icon

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Add code
May 16, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Figure 1 for Llama-Nemotron: Efficient Reasoning Models
Figure 2 for Llama-Nemotron: Efficient Reasoning Models
Figure 3 for Llama-Nemotron: Efficient Reasoning Models
Figure 4 for Llama-Nemotron: Efficient Reasoning Models
Viaarxiv icon

SEE: Continual Fine-tuning with Sequential Ensemble of Experts

Add code
Apr 09, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Figure 1 for Adversarial Training of Reward Models
Figure 2 for Adversarial Training of Reward Models
Figure 3 for Adversarial Training of Reward Models
Figure 4 for Adversarial Training of Reward Models
Viaarxiv icon

Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks

Add code
Mar 06, 2025
Figure 1 for Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks
Figure 2 for Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks
Figure 3 for Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks
Figure 4 for Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks
Viaarxiv icon