Picture for Qingyu Yin

Qingyu Yin

On the Geometry of On-Policy Distillation

Add code
Jun 05, 2026
Viaarxiv icon

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards

Add code
Jun 02, 2026
Viaarxiv icon

Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

Add code
Apr 10, 2026
Viaarxiv icon

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Add code
Apr 03, 2026
Viaarxiv icon

Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

Add code
Mar 25, 2026
Viaarxiv icon

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning

Add code
Jan 30, 2026
Viaarxiv icon

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

Add code
Jan 20, 2026
Viaarxiv icon

Evaluating Parameter Efficient Methods for RLVR

Add code
Dec 30, 2025
Viaarxiv icon

Interleaving Reasoning for Better Text-to-Image Generation

Add code
Sep 09, 2025
Figure 1 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 2 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 3 for Interleaving Reasoning for Better Text-to-Image Generation
Figure 4 for Interleaving Reasoning for Better Text-to-Image Generation
Viaarxiv icon