Picture for Ximing Lu

Ximing Lu

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Add code
Jun 16, 2026
Viaarxiv icon

ProCUA-SFT Technical Report

Add code
Jun 15, 2026
Viaarxiv icon

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Add code
May 19, 2026
Viaarxiv icon

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Add code
Apr 27, 2026
Viaarxiv icon

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

Add code
Mar 19, 2026
Viaarxiv icon

iGRPO: Self-Feedback-Driven LLM Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Add code
Jan 30, 2026
Viaarxiv icon

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

Add code
Nov 07, 2025
Viaarxiv icon

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

Add code
Oct 21, 2025
Figure 1 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 2 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 3 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Figure 4 for ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Viaarxiv icon

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Add code
Sep 09, 2025
Figure 1 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 2 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 3 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 4 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Viaarxiv icon