Picture for Songyang Gao

Songyang Gao

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Add code
Mar 26, 2026
Viaarxiv icon

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Add code
Dec 12, 2025
Viaarxiv icon

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Add code
Dec 12, 2025
Viaarxiv icon

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Add code
Dec 11, 2025
Viaarxiv icon

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Add code
Jul 22, 2025
Figure 1 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 2 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 3 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Figure 4 for Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Viaarxiv icon

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Add code
Jul 17, 2025
Viaarxiv icon

Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law

Add code
Jun 16, 2025
Figure 1 for Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Figure 2 for Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Figure 3 for Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Figure 4 for Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law
Viaarxiv icon

A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future

Add code
Apr 12, 2025
Viaarxiv icon

Unicorn: Text-Only Data Synthesis for Vision Language Model Training

Add code
Mar 28, 2025
Viaarxiv icon