Picture for Bole Ma

Bole Ma

Virtual Width Networks

Add code
Nov 17, 2025
Viaarxiv icon

Truncated Proximal Policy Optimization

Add code
Jun 18, 2025
Figure 1 for Truncated Proximal Policy Optimization
Figure 2 for Truncated Proximal Policy Optimization
Figure 3 for Truncated Proximal Policy Optimization
Figure 4 for Truncated Proximal Policy Optimization
Viaarxiv icon

Model Merging in Pre-training of Large Language Models

Add code
May 17, 2025
Viaarxiv icon

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Add code
Apr 08, 2025
Figure 1 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 2 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Figure 3 for VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Viaarxiv icon

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Add code
Mar 18, 2025
Figure 1 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 2 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 3 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Figure 4 for DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Viaarxiv icon

Personalized Generative Low-light Image Denoising and Enhancement

Add code
Dec 18, 2024
Viaarxiv icon