Picture for Tuo Zhao

Tuo Zhao

Ask a Strong LLM Judge when Your Reward Model is Uncertain

Add code
Oct 23, 2025
Viaarxiv icon

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

Add code
Oct 09, 2025
Viaarxiv icon

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

Add code
May 22, 2025
Viaarxiv icon

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Add code
Apr 20, 2025
Viaarxiv icon

Adversarial Training of Reward Models

Add code
Apr 08, 2025
Figure 1 for Adversarial Training of Reward Models
Figure 2 for Adversarial Training of Reward Models
Figure 3 for Adversarial Training of Reward Models
Figure 4 for Adversarial Training of Reward Models
Viaarxiv icon

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining

Add code
Mar 07, 2025
Viaarxiv icon

LLMs Can Generate a Better Answer by Aggregating Their Own Responses

Add code
Mar 06, 2025
Viaarxiv icon

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

Add code
Mar 04, 2025
Figure 1 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 2 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 3 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Figure 4 for A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Viaarxiv icon

COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs

Add code
Feb 26, 2025
Viaarxiv icon

Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data

Add code
Feb 25, 2025
Viaarxiv icon