Picture for Ke Zeng

Ke Zeng

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

Add code
May 07, 2026
Viaarxiv icon

Global Context or Local Detail? Adaptive Visual Grounding for Hallucination Mitigation

Add code
Apr 27, 2026
Viaarxiv icon

V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization

Add code
Apr 22, 2026
Viaarxiv icon

Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

Add code
Apr 16, 2026
Viaarxiv icon

SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

Add code
Apr 12, 2026
Viaarxiv icon

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Add code
Mar 17, 2026
Viaarxiv icon

From $\boldsymbol{\logπ}$ to $\boldsymbolπ$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

Add code
Mar 15, 2026
Viaarxiv icon

Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Add code
Mar 02, 2026
Viaarxiv icon

Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems

Add code
Mar 01, 2026
Viaarxiv icon

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization

Add code
Feb 22, 2026
Viaarxiv icon