Picture for Zhilin Wang

Zhilin Wang

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Figure 1 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 2 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 3 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 4 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Viaarxiv icon

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Add code
Sep 18, 2025
Viaarxiv icon

Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

Add code
Sep 04, 2025
Viaarxiv icon

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

Add code
May 20, 2025
Viaarxiv icon

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Add code
May 16, 2025
Viaarxiv icon

Llama-Nemotron: Efficient Reasoning Models

Add code
May 02, 2025
Viaarxiv icon

SEE: Continual Fine-tuning with Sequential Ensemble of Experts

Add code
Apr 09, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Figure 1 for Adversarial Training of Reward Models
Figure 2 for Adversarial Training of Reward Models
Figure 3 for Adversarial Training of Reward Models
Figure 4 for Adversarial Training of Reward Models
Viaarxiv icon

Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks

Add code
Mar 06, 2025
Viaarxiv icon

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs

Add code
Mar 06, 2025
Figure 1 for Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Figure 2 for Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Figure 3 for Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Figure 4 for Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Viaarxiv icon