Picture for Lior Shani

Lior Shani

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Figure 1 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 2 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 3 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 4 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Viaarxiv icon

Embedding-Aligned Language Models

Add code
May 24, 2024
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Demystifying Embedding Spaces using Large Language Models

Add code
Oct 06, 2023
Figure 1 for Demystifying Embedding Spaces using Large Language Models
Figure 2 for Demystifying Embedding Spaces using Large Language Models
Figure 3 for Demystifying Embedding Spaces using Large Language Models
Figure 4 for Demystifying Embedding Spaces using Large Language Models
Viaarxiv icon

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Add code
May 31, 2023
Figure 1 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 2 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 3 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Figure 4 for Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Viaarxiv icon

Reinforcement Learning with History-Dependent Dynamic Contexts

Add code
Feb 04, 2023
Figure 1 for Reinforcement Learning with History-Dependent Dynamic Contexts
Figure 2 for Reinforcement Learning with History-Dependent Dynamic Contexts
Viaarxiv icon

Reinforcement Learning with a Terminator

Add code
May 30, 2022
Figure 1 for Reinforcement Learning with a Terminator
Figure 2 for Reinforcement Learning with a Terminator
Figure 3 for Reinforcement Learning with a Terminator
Figure 4 for Reinforcement Learning with a Terminator
Viaarxiv icon

Online Apprenticeship Learning

Add code
Feb 13, 2021
Figure 1 for Online Apprenticeship Learning
Figure 2 for Online Apprenticeship Learning
Figure 3 for Online Apprenticeship Learning
Figure 4 for Online Apprenticeship Learning
Viaarxiv icon

Mirror Descent Policy Optimization

Add code
Jun 09, 2020
Figure 1 for Mirror Descent Policy Optimization
Figure 2 for Mirror Descent Policy Optimization
Figure 3 for Mirror Descent Policy Optimization
Figure 4 for Mirror Descent Policy Optimization
Viaarxiv icon

Optimistic Policy Optimization with Bandit Feedback

Add code
Feb 19, 2020
Figure 1 for Optimistic Policy Optimization with Bandit Feedback
Viaarxiv icon