Alert button
Picture for Dhawal Gupta

Dhawal Gupta

Alert button

From Past to Future: Rethinking Eligibility Traces

Add code
Bookmark button
Alert button
Dec 20, 2023
Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

Viaarxiv icon

Behavior Alignment via Reward Function Optimization

Add code
Bookmark button
Alert button
Oct 31, 2023
Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

Viaarxiv icon

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

Add code
Bookmark button
Alert button
Sep 16, 2023
Simeng Sun, Dhawal Gupta, Mohit Iyyer

Figure 1 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 2 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 3 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Figure 4 for Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Viaarxiv icon

Coagent Networks: Generalized and Scaled

Add code
Bookmark button
Alert button
May 16, 2023
James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

Figure 1 for Coagent Networks: Generalized and Scaled
Figure 2 for Coagent Networks: Generalized and Scaled
Figure 3 for Coagent Networks: Generalized and Scaled
Figure 4 for Coagent Networks: Generalized and Scaled
Viaarxiv icon

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Add code
Bookmark button
Alert button
Feb 21, 2023
Dhawal Gupta, Yinlam Chow, Mohammad Ghavamzadeh, Craig Boutilier

Figure 1 for Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Figure 2 for Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Figure 3 for Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Figure 4 for Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
Viaarxiv icon

Gradient Temporal-Difference Learning with Regularized Corrections

Add code
Bookmark button
Alert button
Jul 07, 2020
Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

Figure 1 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 2 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 3 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 4 for Gradient Temporal-Difference Learning with Regularized Corrections
Viaarxiv icon