Picture for Jinzhu Wu

Jinzhu Wu

Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Add code
Jun 05, 2025
Viaarxiv icon

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Add code
Mar 06, 2025
Viaarxiv icon

Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance

Add code
Feb 18, 2025
Viaarxiv icon