Picture for Marcello Restelli

Marcello Restelli

Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching

Add code
Apr 02, 2026
Viaarxiv icon

How Log-Barrier Helps Exploration in Policy Optimization

Add code
Mar 16, 2026
Viaarxiv icon

Learning in Markov Decision Processes with Exogenous Dynamics

Add code
Mar 04, 2026
Viaarxiv icon

K-Myriad: Jump-starting reinforcement learning with unsupervised parallel agents

Add code
Jan 26, 2026
Viaarxiv icon

From Parameters to Behavior: Unsupervised Compression of the Policy Space

Add code
Sep 26, 2025
Viaarxiv icon

Limitations of Physics-Informed Neural Networks: a Study on Smart Grid Surrogation

Add code
Aug 29, 2025
Viaarxiv icon

"So, Tell Me About Your Policy...": Distillation of interpretable policies from Deep Reinforcement Learning agents

Add code
Jul 10, 2025
Viaarxiv icon

Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story

Add code
May 02, 2025
Viaarxiv icon

Towards Principled Multi-Agent Task Agnostic Exploration

Add code
Feb 12, 2025
Viaarxiv icon

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models

Add code
Jan 30, 2025
Figure 1 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 2 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 3 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Figure 4 for Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
Viaarxiv icon