Picture for Shivaram Kalyanakrishnan

Shivaram Kalyanakrishnan

Efficient Computation of Blackwell Optimal Policies using Rational Functions

Add code
Aug 25, 2025
Viaarxiv icon

Howard's Policy Iteration is Subexponential for Deterministic Markov Decision Problems with Rewards of Fixed Bit-size and Arbitrary Discount Factor

Add code
May 01, 2025
Figure 1 for Howard's Policy Iteration is Subexponential for Deterministic Markov Decision Problems with Rewards of Fixed Bit-size and Arbitrary Discount Factor
Figure 2 for Howard's Policy Iteration is Subexponential for Deterministic Markov Decision Problems with Rewards of Fixed Bit-size and Arbitrary Discount Factor
Viaarxiv icon

A New Interpretation of the Certainty-Equivalence Approach for PAC Reinforcement Learning with a Generative Model

Add code
Jan 05, 2025
Figure 1 for A New Interpretation of the Certainty-Equivalence Approach for PAC Reinforcement Learning with a Generative Model
Figure 2 for A New Interpretation of the Certainty-Equivalence Approach for PAC Reinforcement Learning with a Generative Model
Viaarxiv icon

Artificial Intelligence and Life in 2030: The One Hundred Year Study on Artificial Intelligence

Add code
Oct 31, 2022
Viaarxiv icon

PAC Mode Estimation using PPR Martingale Confidence Sequences

Add code
Sep 10, 2021
Figure 1 for PAC Mode Estimation using PPR Martingale Confidence Sequences
Figure 2 for PAC Mode Estimation using PPR Martingale Confidence Sequences
Figure 3 for PAC Mode Estimation using PPR Martingale Confidence Sequences
Figure 4 for PAC Mode Estimation using PPR Martingale Confidence Sequences
Viaarxiv icon

An Analysis of Frame-skipping in Reinforcement Learning

Add code
Feb 07, 2021
Figure 1 for An Analysis of Frame-skipping in Reinforcement Learning
Figure 2 for An Analysis of Frame-skipping in Reinforcement Learning
Figure 3 for An Analysis of Frame-skipping in Reinforcement Learning
Figure 4 for An Analysis of Frame-skipping in Reinforcement Learning
Viaarxiv icon

Lower Bounds for Policy Iteration on Multi-action MDPs

Add code
Sep 16, 2020
Figure 1 for Lower Bounds for Policy Iteration on Multi-action MDPs
Figure 2 for Lower Bounds for Policy Iteration on Multi-action MDPs
Figure 3 for Lower Bounds for Policy Iteration on Multi-action MDPs
Viaarxiv icon

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

Add code
Jan 24, 2019
Figure 1 for Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Figure 2 for Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Figure 3 for Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Figure 4 for Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Viaarxiv icon

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

Add code
Jan 24, 2019
Figure 1 for PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
Figure 2 for PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
Viaarxiv icon