Picture for Zhiyuan Hu

Zhiyuan Hu

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Add code
Jan 15, 2026
Viaarxiv icon

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Add code
Jan 13, 2026
Viaarxiv icon

Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance

Add code
Nov 13, 2025
Viaarxiv icon

Undersampled Phase Retrieval with Image Priors

Add code
Sep 18, 2025
Viaarxiv icon

Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation

Add code
Jun 14, 2025
Figure 1 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 2 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 3 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Figure 4 for Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation
Viaarxiv icon

BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum

Add code
May 27, 2025
Figure 1 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 2 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 3 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Figure 4 for BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum
Viaarxiv icon

DeepInverse: A Python package for solving imaging inverse problems with deep learning

Add code
May 26, 2025
Viaarxiv icon

Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding

Add code
May 19, 2025
Viaarxiv icon

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Add code
May 15, 2025
Viaarxiv icon

Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation

Add code
Apr 22, 2025
Viaarxiv icon