Picture for Ruslan Salakhutdinov

Ruslan Salakhutdinov

Shammie

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Add code
Mar 12, 2026
Viaarxiv icon

Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models

Add code
Feb 19, 2026
Viaarxiv icon

Maximum Likelihood Reinforcement Learning

Add code
Feb 02, 2026
Viaarxiv icon

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration

Add code
Jan 26, 2026
Viaarxiv icon

Tuning-free Visual Effect Transfer across Videos

Add code
Jan 13, 2026
Viaarxiv icon

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Add code
Oct 02, 2025
Figure 1 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 2 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 3 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 4 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Viaarxiv icon

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Add code
Jun 09, 2025
Viaarxiv icon

Can Large Reasoning Models Self-Train?

Add code
May 27, 2025
Figure 1 for Can Large Reasoning Models Self-Train?
Figure 2 for Can Large Reasoning Models Self-Train?
Figure 3 for Can Large Reasoning Models Self-Train?
Figure 4 for Can Large Reasoning Models Self-Train?
Viaarxiv icon

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

Add code
Mar 12, 2025
Figure 1 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 2 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 3 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 4 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon