Picture for Mark Rowland

Mark Rowland

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Bootstrapped Representations in Reinforcement Learning

Add code
Jun 16, 2023
Viaarxiv icon

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Add code
May 29, 2023
Figure 1 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 2 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 3 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 4 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Viaarxiv icon

VA-learning as a more efficient alternative to Q-learning

Add code
May 29, 2023
Viaarxiv icon

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

Add code
May 28, 2023
Figure 1 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 2 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 3 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Figure 4 for The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Viaarxiv icon

An Analysis of Quantile Temporal-Difference Learning

Add code
Jan 11, 2023
Figure 1 for An Analysis of Quantile Temporal-Difference Learning
Figure 2 for An Analysis of Quantile Temporal-Difference Learning
Figure 3 for An Analysis of Quantile Temporal-Difference Learning
Figure 4 for An Analysis of Quantile Temporal-Difference Learning
Viaarxiv icon

A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

Add code
Dec 08, 2022
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Dec 06, 2022
Figure 1 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 2 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 3 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 4 for Understanding Self-Predictive Learning for Reinforcement Learning
Viaarxiv icon

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Add code
Sep 28, 2022
Figure 1 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 2 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 3 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Figure 4 for Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
Viaarxiv icon