Picture for Alex Beutel

Alex Beutel

Steve

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Add code
Apr 19, 2024
Viaarxiv icon

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Add code
Jan 25, 2024
Viaarxiv icon

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

Add code
Dec 06, 2023
Viaarxiv icon

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

Add code
Oct 25, 2023
Viaarxiv icon

Controlled Decoding from Language Models

Add code
Oct 25, 2023
Figure 1 for Controlled Decoding from Language Models
Figure 2 for Controlled Decoding from Language Models
Figure 3 for Controlled Decoding from Language Models
Figure 4 for Controlled Decoding from Language Models
Viaarxiv icon

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

Add code
Oct 25, 2023
Viaarxiv icon

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

Add code
Oct 25, 2023
Figure 1 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 2 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 3 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 4 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Viaarxiv icon

Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders

Add code
Aug 23, 2023
Figure 1 for Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
Figure 2 for Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
Figure 3 for Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
Viaarxiv icon

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

Add code
Jul 11, 2023
Figure 1 for Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification
Figure 2 for Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification
Figure 3 for Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification
Figure 4 for Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification
Viaarxiv icon

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Add code
Jun 25, 2023
Figure 1 for Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Figure 2 for Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Viaarxiv icon