Picture for Zhijiang Guo

Zhijiang Guo

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Add code
Jun 16, 2026
Viaarxiv icon

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Add code
Jun 11, 2026
Viaarxiv icon

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Add code
Jun 09, 2026
Viaarxiv icon

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Add code
Jun 02, 2026
Viaarxiv icon

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Add code
May 18, 2026
Viaarxiv icon

From Table to Cell: Attention for Better Reasoning with TABALIGN

Add code
May 14, 2026
Viaarxiv icon

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

Add code
May 11, 2026
Viaarxiv icon

CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

Add code
Apr 14, 2026
Viaarxiv icon

Spotlight and Shadow: Attention-Guided Dual-Anchor Introspective Decoding for MLLM Hallucination Mitigation

Add code
Apr 11, 2026
Viaarxiv icon

Skip-Connected Policy Optimization for Implicit Advantage

Add code
Apr 09, 2026
Viaarxiv icon