The Art Of Scaling Reinforcement Learning Compute For Llms


WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Add code
Apr 22, 2026
Viaarxiv icon

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

Add code
Mar 18, 2026
Viaarxiv icon

The Art of Efficient Reasoning: Data, Reward, and Optimization

Add code
Feb 25, 2026
Viaarxiv icon

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Add code
Mar 03, 2026
Viaarxiv icon

Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Add code
Feb 26, 2026
Viaarxiv icon

Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation

Add code
Feb 09, 2026
Viaarxiv icon

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Add code
Feb 03, 2026
Viaarxiv icon

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

Add code
Jan 29, 2026
Viaarxiv icon

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

Add code
Jan 16, 2026
Viaarxiv icon

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Add code
Dec 19, 2025
Figure 1 for Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Figure 2 for Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Figure 3 for Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Figure 4 for Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Viaarxiv icon