Alert button
Picture for Bernardo Ávila Pires

Bernardo Ávila Pires

Alert button

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Bookmark button
Alert button
Feb 08, 2024
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Viaarxiv icon

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Add code
Bookmark button
Alert button
May 29, 2023
Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Figure 1 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 2 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 3 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 4 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Bookmark button
Alert button
Dec 06, 2022
Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

Figure 1 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 2 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 3 for Understanding Self-Predictive Learning for Reinforcement Learning
Figure 4 for Understanding Self-Predictive Learning for Reinforcement Learning
Viaarxiv icon

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

Add code
Bookmark button
Alert button
Jul 15, 2022
Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

Figure 1 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 2 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 3 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 4 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Viaarxiv icon

Multiclass Classification Calibration Functions

Add code
Bookmark button
Alert button
Sep 20, 2016
Bernardo Ávila Pires, Csaba Szepesvári

Figure 1 for Multiclass Classification Calibration Functions
Figure 2 for Multiclass Classification Calibration Functions
Figure 3 for Multiclass Classification Calibration Functions
Figure 4 for Multiclass Classification Calibration Functions
Viaarxiv icon

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

Add code
Bookmark button
Alert button
Sep 20, 2016
Bernardo Ávila Pires, Csaba Szepesvári

Figure 1 for Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
Figure 2 for Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
Viaarxiv icon