Picture for Sina Ghiassian

Sina Ghiassian

Soft Preference Optimization: Aligning Language Models to Expert Distributions

Add code
Apr 30, 2024
Viaarxiv icon

On the Importance of Uncertainty in Decision-Making with Large Language Models

Add code
Apr 03, 2024
Figure 1 for On the Importance of Uncertainty in Decision-Making with Large Language Models
Figure 2 for On the Importance of Uncertainty in Decision-Making with Large Language Models
Figure 3 for On the Importance of Uncertainty in Decision-Making with Large Language Models
Figure 4 for On the Importance of Uncertainty in Decision-Making with Large Language Models
Viaarxiv icon

In-context Exploration-Exploitation for Reinforcement Learning

Add code
Mar 11, 2024
Figure 1 for In-context Exploration-Exploitation for Reinforcement Learning
Figure 2 for In-context Exploration-Exploitation for Reinforcement Learning
Figure 3 for In-context Exploration-Exploitation for Reinforcement Learning
Viaarxiv icon

Auxiliary task discovery through generate-and-test

Add code
Oct 25, 2022
Figure 1 for Auxiliary task discovery through generate-and-test
Figure 2 for Auxiliary task discovery through generate-and-test
Figure 3 for Auxiliary task discovery through generate-and-test
Figure 4 for Auxiliary task discovery through generate-and-test
Viaarxiv icon

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Add code
Mar 18, 2022
Figure 1 for Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Figure 2 for Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Add code
Sep 10, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Add code
Jun 11, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Viaarxiv icon

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Add code
Apr 28, 2021
Figure 1 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 2 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 3 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 4 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Viaarxiv icon

Does Standard Backpropagation Forget Less Catastrophically Than Adam?

Add code
Feb 20, 2021
Figure 1 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 2 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 3 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 4 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Viaarxiv icon

Gradient Temporal-Difference Learning with Regularized Corrections

Add code
Jul 07, 2020
Figure 1 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 2 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 3 for Gradient Temporal-Difference Learning with Regularized Corrections
Figure 4 for Gradient Temporal-Difference Learning with Regularized Corrections
Viaarxiv icon