Picture for Csaba Szepesvari

Csaba Szepesvari

Dj

Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

Add code
Apr 01, 2026
Viaarxiv icon

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

Add code
May 06, 2025
Viaarxiv icon

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Add code
Apr 02, 2025
Figure 1 for Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Figure 2 for Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Figure 3 for Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Viaarxiv icon

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Add code
Feb 11, 2025
Figure 1 for Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Figure 2 for Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Viaarxiv icon

Stochastic Gradient Succeeds for Bandits

Add code
Feb 27, 2024
Figure 1 for Stochastic Gradient Succeeds for Bandits
Figure 2 for Stochastic Gradient Succeeds for Bandits
Figure 3 for Stochastic Gradient Succeeds for Bandits
Figure 4 for Stochastic Gradient Succeeds for Bandits
Viaarxiv icon

Sample Efficient Deep Reinforcement Learning via Local Planning

Add code
Jan 29, 2023
Figure 1 for Sample Efficient Deep Reinforcement Learning via Local Planning
Figure 2 for Sample Efficient Deep Reinforcement Learning via Local Planning
Figure 3 for Sample Efficient Deep Reinforcement Learning via Local Planning
Figure 4 for Sample Efficient Deep Reinforcement Learning via Local Planning
Viaarxiv icon

The Role of Baselines in Policy Gradient Optimization

Add code
Jan 16, 2023
Viaarxiv icon

Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

Add code
Sep 29, 2022
Figure 1 for Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making
Figure 2 for Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making
Viaarxiv icon

Towards Painless Policy Optimization for Constrained MDPs

Add code
Apr 11, 2022
Figure 1 for Towards Painless Policy Optimization for Constrained MDPs
Figure 2 for Towards Painless Policy Optimization for Constrained MDPs
Figure 3 for Towards Painless Policy Optimization for Constrained MDPs
Figure 4 for Towards Painless Policy Optimization for Constrained MDPs
Viaarxiv icon

Understanding the Effect of Stochasticity in Policy Optimization

Add code
Oct 29, 2021
Figure 1 for Understanding the Effect of Stochasticity in Policy Optimization
Figure 2 for Understanding the Effect of Stochasticity in Policy Optimization
Figure 3 for Understanding the Effect of Stochasticity in Policy Optimization
Viaarxiv icon