Picture for R. Srikant

R. Srikant

Achieving Small Test Error in Mildly Overparameterized Neural Networks

Add code
Apr 24, 2021
Viaarxiv icon

Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning

Add code
Mar 02, 2021
Figure 1 for Sample Complexity and Overparameterization Bounds for Projection-Free Neural TD Learning
Viaarxiv icon

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

Add code
Feb 13, 2021
Figure 1 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 2 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 3 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 4 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Viaarxiv icon

One-bit feedback is sufficient for upper confidence bound policies

Add code
Dec 04, 2020
Figure 1 for One-bit feedback is sufficient for upper confidence bound policies
Figure 2 for One-bit feedback is sufficient for upper confidence bound policies
Figure 3 for One-bit feedback is sufficient for upper confidence bound policies
Viaarxiv icon

Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging

Add code
Nov 17, 2020
Figure 1 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging
Figure 2 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging
Figure 3 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging
Figure 4 for Combining Reinforcement Learning with Model Predictive Control for On-Ramp Merging
Viaarxiv icon

On the Consistency of Maximum Likelihood Estimators for Causal Network Identification

Add code
Oct 17, 2020
Figure 1 for On the Consistency of Maximum Likelihood Estimators for Causal Network Identification
Figure 2 for On the Consistency of Maximum Likelihood Estimators for Causal Network Identification
Figure 3 for On the Consistency of Maximum Likelihood Estimators for Causal Network Identification
Viaarxiv icon

Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

Add code
Sep 14, 2020
Figure 1 for Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings
Figure 2 for Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings
Viaarxiv icon

Provably-Efficient Double Q-Learning

Add code
Jul 09, 2020
Figure 1 for Provably-Efficient Double Q-Learning
Figure 2 for Provably-Efficient Double Q-Learning
Figure 3 for Provably-Efficient Double Q-Learning
Viaarxiv icon

Robust Multi-Agent Multi-Armed Bandits

Add code
Jul 07, 2020
Figure 1 for Robust Multi-Agent Multi-Armed Bandits
Figure 2 for Robust Multi-Agent Multi-Armed Bandits
Viaarxiv icon

Continuous-Time Multi-Armed Bandits with Controlled Restarts

Add code
Jun 30, 2020
Figure 1 for Continuous-Time Multi-Armed Bandits with Controlled Restarts
Figure 2 for Continuous-Time Multi-Armed Bandits with Controlled Restarts
Figure 3 for Continuous-Time Multi-Armed Bandits with Controlled Restarts
Viaarxiv icon