Picture for Wenhao Yu

Wenhao Yu

China University of Geosciences

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Add code
Dec 17, 2025
Figure 1 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 2 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 3 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Figure 4 for Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Viaarxiv icon

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Add code
Dec 14, 2025
Figure 1 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 2 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 3 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Figure 4 for MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Viaarxiv icon

Evaluating Gemini Robotics Policies in a Veo World Simulator

Add code
Dec 11, 2025
Viaarxiv icon

Modified-Emergency Index (MEI): A Criticality Metric for Autonomous Driving in Lateral Conflict

Add code
Oct 31, 2025
Viaarxiv icon

Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling

Add code
Oct 30, 2025
Viaarxiv icon

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Add code
Oct 01, 2025
Figure 1 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 2 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 3 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Figure 4 for VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning
Viaarxiv icon

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Add code
Sep 18, 2025
Viaarxiv icon

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Add code
Sep 09, 2025
Viaarxiv icon

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Add code
Aug 27, 2025
Figure 1 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 2 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 3 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Figure 4 for Self-Rewarding Vision-Language Model via Reasoning Decomposition
Viaarxiv icon

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Add code
Aug 07, 2025
Figure 1 for R-Zero: Self-Evolving Reasoning LLM from Zero Data
Figure 2 for R-Zero: Self-Evolving Reasoning LLM from Zero Data
Figure 3 for R-Zero: Self-Evolving Reasoning LLM from Zero Data
Figure 4 for R-Zero: Self-Evolving Reasoning LLM from Zero Data
Viaarxiv icon