Picture for Marcello Restelli

Marcello Restelli

"So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents

Add code
Jul 10, 2025
Viaarxiv icon

Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story

Add code
May 02, 2025
Viaarxiv icon

Towards Principled Multi-Agent Task Agnostic Exploration

Add code
Feb 12, 2025
Viaarxiv icon

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models

Add code
Jan 30, 2025
Figure 1 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 2 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 3 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 4 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Viaarxiv icon

A parametric algorithm is optimal for non-parametric regression of smooth functions

Add code
Dec 19, 2024
Figure 1 for A parametric algorithm is optimal for non-parametric regression of smooth functions
Figure 2 for A parametric algorithm is optimal for non-parametric regression of smooth functions
Figure 3 for A parametric algorithm is optimal for non-parametric regression of smooth functions
Figure 4 for A parametric algorithm is optimal for non-parametric regression of smooth functions
Viaarxiv icon

Statistical Analysis of Policy Space Compression Problem

Add code
Nov 15, 2024
Viaarxiv icon

A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics

Add code
Nov 08, 2024
Viaarxiv icon

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

Add code
Oct 31, 2024
Viaarxiv icon

Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach

Add code
Oct 17, 2024
Figure 1 for Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Figure 2 for Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Figure 3 for Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Figure 4 for Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
Viaarxiv icon

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting

Add code
Oct 02, 2024
Viaarxiv icon