Picture for Ru Peng

Ru Peng

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Add code
Jun 09, 2026
Viaarxiv icon

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

Add code
Jun 07, 2026
Viaarxiv icon

From "Weak" Signals to Strong Models: Preference Delta Aggregation with LoRA Merging

Add code
May 29, 2026
Viaarxiv icon

Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

Add code
Apr 01, 2026
Viaarxiv icon

DataMan: Data Manager for Pre-training Large Language Models

Add code
Feb 26, 2025
Viaarxiv icon

Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

Add code
Aug 20, 2024
Figure 1 for Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Figure 2 for Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Figure 3 for Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Figure 4 for Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Add code
Jul 04, 2024
Viaarxiv icon

Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation

Add code
Jun 20, 2024
Figure 1 for Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Figure 2 for Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Figure 3 for Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Figure 4 for Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation
Viaarxiv icon

DORY: Deliberative Prompt Recovery for LLM

Add code
May 31, 2024
Viaarxiv icon