Picture for Songyang Gao

Songyang Gao

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Add code
Jul 22, 2025
Viaarxiv icon

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner

Add code
Jul 17, 2025
Viaarxiv icon

Capability Salience Vector: Fine-grained Alignment of Loss and Capabilities for Downstream Task Scaling Law

Add code
Jun 16, 2025
Viaarxiv icon

A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future

Add code
Apr 12, 2025
Viaarxiv icon

Unicorn: Text-Only Data Synthesis for Vision Language Model Training

Add code
Mar 28, 2025
Viaarxiv icon

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Add code
Feb 10, 2025
Viaarxiv icon

Are Your LLMs Capable of Stable Reasoning?

Add code
Dec 17, 2024
Viaarxiv icon

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

Add code
Aug 27, 2024
Viaarxiv icon

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Add code
Jun 06, 2024
Figure 1 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 2 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 3 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Figure 4 for AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Viaarxiv icon

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

Add code
Apr 09, 2024
Figure 1 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 2 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 3 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Figure 4 for Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Viaarxiv icon