Picture for Mingyuan Zhou

Mingyuan Zhou

Duke University

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

Add code
Mar 17, 2026
Viaarxiv icon

Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling

Add code
Feb 11, 2026
Viaarxiv icon

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Add code
Jun 09, 2025
Figure 1 for Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Figure 2 for Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Figure 3 for Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Figure 4 for Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Viaarxiv icon

Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay

Add code
Jun 05, 2025
Viaarxiv icon

Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer

Add code
May 28, 2025
Figure 1 for Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Figure 2 for Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Figure 3 for Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Figure 4 for Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer
Viaarxiv icon

Mechanical in-sensor computing: a programmable meta-sensor for structural damage classification without external electronic power

Add code
May 24, 2025
Viaarxiv icon

InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO

Add code
May 23, 2025
Viaarxiv icon

Few-Step Diffusion via Score identity Distillation

Add code
May 19, 2025
Viaarxiv icon

Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation

Add code
May 19, 2025
Figure 1 for Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation
Figure 2 for Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation
Figure 3 for Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation
Figure 4 for Restoration Score Distillation: From Corrupted Diffusion Pretraining to One-Step High-Quality Generation
Viaarxiv icon

A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation

Add code
May 16, 2025
Viaarxiv icon