Picture for Rishabh Joshi

Rishabh Joshi

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Figure 1 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 2 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 3 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 4 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

LiPO: Listwise Preference Optimization through Learning-to-Rank

Add code
Feb 02, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Calibrating Likelihoods towards Consistency in Summarization Models

Add code
Oct 12, 2023
Figure 1 for Calibrating Likelihoods towards Consistency in Summarization Models
Figure 2 for Calibrating Likelihoods towards Consistency in Summarization Models
Figure 3 for Calibrating Likelihoods towards Consistency in Summarization Models
Figure 4 for Calibrating Likelihoods towards Consistency in Summarization Models
Viaarxiv icon

Statistical Rejection Sampling Improves Preference Optimization

Add code
Sep 13, 2023
Figure 1 for Statistical Rejection Sampling Improves Preference Optimization
Figure 2 for Statistical Rejection Sampling Improves Preference Optimization
Figure 3 for Statistical Rejection Sampling Improves Preference Optimization
Figure 4 for Statistical Rejection Sampling Improves Preference Optimization
Viaarxiv icon

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Add code
May 17, 2023
Figure 1 for SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Figure 2 for SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Figure 3 for SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Figure 4 for SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Viaarxiv icon

Calibrating Sequence likelihood Improves Conditional Language Generation

Add code
Sep 30, 2022
Figure 1 for Calibrating Sequence likelihood Improves Conditional Language Generation
Figure 2 for Calibrating Sequence likelihood Improves Conditional Language Generation
Figure 3 for Calibrating Sequence likelihood Improves Conditional Language Generation
Figure 4 for Calibrating Sequence likelihood Improves Conditional Language Generation
Viaarxiv icon

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

Add code
Mar 15, 2022
Figure 1 for Unsupervised Keyphrase Extraction via Interpretable Neural Networks
Figure 2 for Unsupervised Keyphrase Extraction via Interpretable Neural Networks
Figure 3 for Unsupervised Keyphrase Extraction via Interpretable Neural Networks
Figure 4 for Unsupervised Keyphrase Extraction via Interpretable Neural Networks
Viaarxiv icon