Alert button
Picture for Qinbo Bai

Qinbo Bai

Alert button

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

Add code
Bookmark button
Alert button
Feb 03, 2024
Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Viaarxiv icon

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

Add code
Bookmark button
Alert button
Sep 05, 2023
Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Viaarxiv icon

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

Add code
Bookmark button
Alert button
Jun 12, 2022
Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

Figure 1 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Figure 2 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Figure 3 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Viaarxiv icon

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Add code
Bookmark button
Alert button
Sep 13, 2021
Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

Figure 1 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Figure 2 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Figure 3 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Viaarxiv icon

Concave Utility Reinforcement Learning with Zero-Constraint Violations

Add code
Bookmark button
Alert button
Sep 12, 2021
Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

Figure 1 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 2 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 3 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 4 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Viaarxiv icon

Markov Decision Processes with Long-Term Average Constraints

Add code
Bookmark button
Alert button
Jun 12, 2021
Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

Figure 1 for Markov Decision Processes with Long-Term Average Constraints
Figure 2 for Markov Decision Processes with Long-Term Average Constraints
Figure 3 for Markov Decision Processes with Long-Term Average Constraints
Figure 4 for Markov Decision Processes with Long-Term Average Constraints
Viaarxiv icon

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Add code
Bookmark button
Alert button
May 28, 2021
Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Figure 1 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Figure 2 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Figure 3 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Viaarxiv icon

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

Add code
Bookmark button
Alert button
Jun 10, 2020
Qinbo Bai, Vaneet Aggarwal, Ather Gattami

Figure 1 for Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints
Figure 2 for Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints
Viaarxiv icon

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

Add code
Bookmark button
Alert button
Mar 11, 2020
Qinbo Bai, Ather Gattami, Vaneet Aggarwal

Figure 1 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Figure 2 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Figure 3 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Viaarxiv icon

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Add code
Bookmark button
Alert button
Oct 03, 2019
Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Viaarxiv icon