Picture for Nathan Lambert

Nathan Lambert

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Add code
Jun 26, 2024
Viaarxiv icon

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Add code
Jun 13, 2024
Viaarxiv icon

D2PO: Discriminator-Guided DPO with Response Evaluation Models

Add code
May 02, 2024
Figure 1 for D2PO: Discriminator-Guided DPO with Response Evaluation Models
Figure 2 for D2PO: Discriminator-Guided DPO with Response Evaluation Models
Figure 3 for D2PO: Discriminator-Guided DPO with Response Evaluation Models
Figure 4 for D2PO: Discriminator-Guided DPO with Response Evaluation Models
Viaarxiv icon

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Add code
Apr 16, 2024
Figure 1 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 2 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 3 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Figure 4 for Social Choice for AI Alignment: Dealing with Diverse Human Feedback
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Figure 1 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 2 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 3 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 4 for RewardBench: Evaluating Reward Models for Language Modeling
Viaarxiv icon

A Survey on Data Selection for Language Models

Add code
Mar 08, 2024
Figure 1 for A Survey on Data Selection for Language Models
Figure 2 for A Survey on Data Selection for Language Models
Figure 3 for A Survey on Data Selection for Language Models
Figure 4 for A Survey on Data Selection for Language Models
Viaarxiv icon

OLMo: Accelerating the Science of Language Models

Add code
Feb 07, 2024
Figure 1 for OLMo: Accelerating the Science of Language Models
Figure 2 for OLMo: Accelerating the Science of Language Models
Figure 3 for OLMo: Accelerating the Science of Language Models
Figure 4 for OLMo: Accelerating the Science of Language Models
Viaarxiv icon

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Add code
Jan 31, 2024
Figure 1 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 2 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 3 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Figure 4 for Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Viaarxiv icon

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Add code
Nov 20, 2023
Figure 1 for Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Figure 2 for Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Figure 3 for Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Figure 4 for Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Viaarxiv icon

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Add code
Oct 31, 2023
Figure 1 for The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Figure 2 for The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Figure 3 for The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Figure 4 for The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Viaarxiv icon