Picture for Yanwei Ren

Yanwei Ren

Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance

Add code
Feb 27, 2026
Viaarxiv icon

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning

Add code
Jun 14, 2025
Figure 1 for LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
Figure 2 for LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
Figure 3 for LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
Figure 4 for LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
Viaarxiv icon

SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Add code
Jun 06, 2025
Figure 1 for SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Figure 2 for SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Figure 3 for SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Figure 4 for SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation
Viaarxiv icon