Picture for Guojun Yin

Guojun Yin

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration

Add code
May 27, 2026
Viaarxiv icon

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Add code
May 27, 2026
Viaarxiv icon

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

Add code
May 25, 2026
Viaarxiv icon

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Add code
May 18, 2026
Viaarxiv icon

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Add code
May 18, 2026
Viaarxiv icon

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

Add code
May 12, 2026
Viaarxiv icon

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Add code
Apr 15, 2026
Viaarxiv icon

DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

Add code
Apr 12, 2026
Viaarxiv icon

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

Add code
Mar 09, 2026
Viaarxiv icon

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Add code
Mar 03, 2026
Viaarxiv icon