Picture for Min Lin

Min Lin

Rethinking the Trust Region in LLM Reinforcement Learning

Add code
Feb 04, 2026
Viaarxiv icon

Revisiting Parameter Server in LLM Post-Training

Add code
Jan 27, 2026
Viaarxiv icon

Defeating the Training-Inference Mismatch via FP16

Add code
Oct 30, 2025
Viaarxiv icon

Nonparametric Data Attribution for Diffusion Models

Add code
Oct 16, 2025
Viaarxiv icon

DEPTHOR++: Robust Depth Enhancement from a Real-World Lightweight dToF and RGB Guidance

Add code
Sep 30, 2025
Viaarxiv icon

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Add code
Sep 26, 2025
Figure 1 for Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Figure 2 for Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Figure 3 for Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Figure 4 for Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Viaarxiv icon

Variational Reasoning for Language Models

Add code
Sep 26, 2025
Figure 1 for Variational Reasoning for Language Models
Figure 2 for Variational Reasoning for Language Models
Figure 3 for Variational Reasoning for Language Models
Figure 4 for Variational Reasoning for Language Models
Viaarxiv icon

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

Add code
Jun 10, 2025
Viaarxiv icon

Reinforcing General Reasoning without Verifiers

Add code
May 27, 2025
Viaarxiv icon

Lifelong Safety Alignment for Language Models

Add code
May 26, 2025
Figure 1 for Lifelong Safety Alignment for Language Models
Figure 2 for Lifelong Safety Alignment for Language Models
Figure 3 for Lifelong Safety Alignment for Language Models
Figure 4 for Lifelong Safety Alignment for Language Models
Viaarxiv icon