Picture for Taco Cohen

Taco Cohen

Efficient RL Training for LLMs with Experience Replay

Add code
Apr 09, 2026
Viaarxiv icon

A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula

Add code
Mar 25, 2026
Viaarxiv icon

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Add code
Nov 19, 2025
Viaarxiv icon

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

The KoLMogorov Test: Compression by Code Generation

Add code
Mar 18, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon

Does equivariance matter at scale?

Add code
Oct 30, 2024
Viaarxiv icon

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Add code
Oct 10, 2024
Figure 1 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 2 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 3 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 4 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Viaarxiv icon

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Add code
Oct 02, 2024
Figure 1 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 2 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 3 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 4 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Viaarxiv icon

Information-driven Affordance Discovery for Efficient Robotic Manipulation

Add code
May 06, 2024
Figure 1 for Information-driven Affordance Discovery for Efficient Robotic Manipulation
Figure 2 for Information-driven Affordance Discovery for Efficient Robotic Manipulation
Figure 3 for Information-driven Affordance Discovery for Efficient Robotic Manipulation
Figure 4 for Information-driven Affordance Discovery for Efficient Robotic Manipulation
Viaarxiv icon