Alert button
Picture for Bruno Castro da Silva

Bruno Castro da Silva

Alert button

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

Add code
Bookmark button
Alert button
Apr 16, 2024
Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

Viaarxiv icon

From Past to Future: Rethinking Eligibility Traces

Add code
Bookmark button
Alert button
Dec 20, 2023
Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

Viaarxiv icon

Behavior Alignment via Reward Function Optimization

Add code
Bookmark button
Alert button
Oct 31, 2023
Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

Viaarxiv icon

Coagent Networks: Generalized and Scaled

Add code
Bookmark button
Alert button
May 16, 2023
James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

Figure 1 for Coagent Networks: Generalized and Scaled
Figure 2 for Coagent Networks: Generalized and Scaled
Figure 3 for Coagent Networks: Generalized and Scaled
Figure 4 for Coagent Networks: Generalized and Scaled
Viaarxiv icon

Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Add code
Bookmark button
Alert button
Jan 24, 2023
Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

Figure 1 for Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Figure 2 for Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Figure 3 for Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Figure 4 for Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Viaarxiv icon

Model-Based Reinforcement Learning with SINDy

Add code
Bookmark button
Alert button
Aug 30, 2022
Rushiv Arora, Bruno Castro da Silva, Eliot Moss

Figure 1 for Model-Based Reinforcement Learning with SINDy
Figure 2 for Model-Based Reinforcement Learning with SINDy
Viaarxiv icon

Enforcing Delayed-Impact Fairness Guarantees

Add code
Bookmark button
Alert button
Aug 24, 2022
Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

Figure 1 for Enforcing Delayed-Impact Fairness Guarantees
Figure 2 for Enforcing Delayed-Impact Fairness Guarantees
Figure 3 for Enforcing Delayed-Impact Fairness Guarantees
Figure 4 for Enforcing Delayed-Impact Fairness Guarantees
Viaarxiv icon

Universal Off-Policy Evaluation

Add code
Bookmark button
Alert button
Apr 26, 2021
Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

Figure 1 for Universal Off-Policy Evaluation
Figure 2 for Universal Off-Policy Evaluation
Figure 3 for Universal Off-Policy Evaluation
Figure 4 for Universal Off-Policy Evaluation
Viaarxiv icon