Alert button
Picture for Ahmad Beirami

Ahmad Beirami

Alert button

Gradient-Based Language Model Red Teaming

Jan 30, 2024
Nevan Wichers, Carson Denison, Ahmad Beirami

Viaarxiv icon

Theoretical guarantees on the best-of-n alignment policy

Jan 03, 2024
Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh

Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Dec 21, 2023
Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alex D'Amour, DJ Dvijotham, Adam Fisch, Katherine Heller, Stephen Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant

Viaarxiv icon

Multi-Group Fairness Evaluation via Conditional Value-at-Risk Testing

Dec 06, 2023
Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami

Viaarxiv icon

FRAPPÉ: A Post-Processing Framework for Group Fairness Regularization

Dec 05, 2023
Alexandru Ţifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, Flavien Prost

Viaarxiv icon

Improving Robustness via Tilted Exponential Layer: A Communication-Theoretic Perspective

Nov 02, 2023
Bhagyashree Puranik, Ahmad Beirami, Yao Qin, Upamanyu Madhow

Viaarxiv icon

Controlled Decoding from Language Models

Oct 25, 2023
Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami

Viaarxiv icon

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

Oct 25, 2023
Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel

Viaarxiv icon

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

Oct 25, 2023
Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel

Viaarxiv icon

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

Oct 25, 2023
Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen

Figure 1 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 2 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 3 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Figure 4 for Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Viaarxiv icon