Picture for Alexander Rakhlin

Alexander Rakhlin

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits

Add code
May 26, 2025
Viaarxiv icon

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Add code
May 21, 2025
Viaarxiv icon

Near-Optimal Private Learning in Linear Contextual Bandits

Add code
Feb 18, 2025
Viaarxiv icon

Decision Making in Changing Environments: Robustness, Query-Based Learning, and Differential Privacy

Add code
Jan 24, 2025
Viaarxiv icon

Refined Risk Bounds for Unbounded Losses via Transductive Priors

Add code
Oct 29, 2024
Viaarxiv icon

How Does Variance Shape the Regret in Contextual Bandits?

Add code
Oct 16, 2024
Viaarxiv icon

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

Add code
Oct 07, 2024
Viaarxiv icon

Random Latent Exploration for Deep Reinforcement Learning

Add code
Jul 18, 2024
Figure 1 for Random Latent Exploration for Deep Reinforcement Learning
Figure 2 for Random Latent Exploration for Deep Reinforcement Learning
Figure 3 for Random Latent Exploration for Deep Reinforcement Learning
Figure 4 for Random Latent Exploration for Deep Reinforcement Learning
Viaarxiv icon

Near-Optimal Learning and Planning in Separated Latent MDPs

Add code
Jun 12, 2024
Viaarxiv icon

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Add code
May 31, 2024
Figure 1 for Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Figure 2 for Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Viaarxiv icon