Picture for Jacob Eisenstein

Jacob Eisenstein

Transforming and Combining Rewards for Aligning Large Language Models

Add code
Feb 01, 2024
Viaarxiv icon

Theoretical guarantees on the best-of-n alignment policy

Add code
Jan 03, 2024
Figure 1 for Theoretical guarantees on the best-of-n alignment policy
Figure 2 for Theoretical guarantees on the best-of-n alignment policy
Figure 3 for Theoretical guarantees on the best-of-n alignment policy
Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Add code
Dec 21, 2023
Figure 1 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 2 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 3 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Figure 4 for Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Viaarxiv icon

Selectively Answering Ambiguous Questions

Add code
May 24, 2023
Viaarxiv icon

MD3: The Multi-Dialect Dataset of Dialogues

Add code
May 19, 2023
Figure 1 for MD3: The Multi-Dialect Dataset of Dialogues
Figure 2 for MD3: The Multi-Dialect Dataset of Dialogues
Figure 3 for MD3: The Multi-Dialect Dataset of Dialogues
Figure 4 for MD3: The Multi-Dialect Dataset of Dialogues
Viaarxiv icon

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Add code
Dec 15, 2022
Figure 1 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 2 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 3 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Figure 4 for Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Viaarxiv icon

Dialect-robust Evaluation of Generated Text

Add code
Nov 02, 2022
Figure 1 for Dialect-robust Evaluation of Generated Text
Figure 2 for Dialect-robust Evaluation of Generated Text
Figure 3 for Dialect-robust Evaluation of Generated Text
Figure 4 for Dialect-robust Evaluation of Generated Text
Viaarxiv icon

Predicting Long-Term Citations from Short-Term Linguistic Influence

Add code
Oct 24, 2022
Figure 1 for Predicting Long-Term Citations from Short-Term Linguistic Influence
Figure 2 for Predicting Long-Term Citations from Short-Term Linguistic Influence
Figure 3 for Predicting Long-Term Citations from Short-Term Linguistic Influence
Figure 4 for Predicting Long-Term Citations from Short-Term Linguistic Influence
Viaarxiv icon

Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

Add code
Oct 05, 2022
Figure 1 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Figure 2 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Figure 3 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Figure 4 for Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Viaarxiv icon

Uninformative Input Features and Counterfactual Invariance: Two Perspectives on Spurious Correlations in Natural Language

Add code
Apr 09, 2022
Figure 1 for Uninformative Input Features and Counterfactual Invariance: Two Perspectives on Spurious Correlations in Natural Language
Figure 2 for Uninformative Input Features and Counterfactual Invariance: Two Perspectives on Spurious Correlations in Natural Language
Viaarxiv icon