Picture for Shizhu He

Shizhu He

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Add code
Apr 15, 2026
Viaarxiv icon

ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

Add code
Mar 31, 2026
Viaarxiv icon

WideSeek: Advancing Wide Research via Multi-Agent Scaling

Add code
Feb 02, 2026
Viaarxiv icon

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Add code
Dec 22, 2025
Figure 1 for Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Figure 2 for Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Figure 3 for Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Figure 4 for Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Viaarxiv icon

The Zero-Step Thinking: An Empirical Study of Mode Selection as Harder Early Exit in Reasoning Models

Add code
Oct 22, 2025
Viaarxiv icon

Towards Agentic Self-Learning LLMs in Search Environment

Add code
Oct 16, 2025
Figure 1 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 2 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 3 for Towards Agentic Self-Learning LLMs in Search Environment
Figure 4 for Towards Agentic Self-Learning LLMs in Search Environment
Viaarxiv icon

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning

Add code
Aug 21, 2025
Viaarxiv icon

Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns

Add code
May 29, 2025
Viaarxiv icon

Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN

Add code
May 22, 2025
Viaarxiv icon

Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention

Add code
May 21, 2025
Viaarxiv icon