Picture for Qi Zhang

Qi Zhang

NVIDIA

Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Add code
Feb 26, 2025
Figure 1 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 2 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 3 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Figure 4 for Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Viaarxiv icon

VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model

Add code
Feb 26, 2025
Figure 1 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 2 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 3 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Figure 4 for VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Viaarxiv icon

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Add code
Feb 25, 2025
Figure 1 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 2 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 3 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 4 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Viaarxiv icon

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

Add code
Feb 24, 2025
Viaarxiv icon

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Add code
Feb 20, 2025
Viaarxiv icon

Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise

Add code
Feb 19, 2025
Figure 1 for Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise
Figure 2 for Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise
Figure 3 for Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise
Figure 4 for Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise
Viaarxiv icon

D.Va: Validate Your Demonstration First Before You Use It

Add code
Feb 19, 2025
Viaarxiv icon

Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models

Add code
Feb 13, 2025
Viaarxiv icon

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

Add code
Feb 06, 2025
Viaarxiv icon

MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

Add code
Feb 05, 2025
Viaarxiv icon