Picture for Ahmad Beirami

Ahmad Beirami

EJ

Inducing Group Fairness in LLM-Based Decisions

Add code
Jun 24, 2024
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

Robust Preference Optimization through Reward Model Distillation

Add code
May 29, 2024
Viaarxiv icon

Mitigating Object Hallucination via Data Augmented Contrastive Tuning

Add code
May 28, 2024
Figure 1 for Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Figure 2 for Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Figure 3 for Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Figure 4 for Mitigating Object Hallucination via Data Augmented Contrastive Tuning
Viaarxiv icon

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Add code
Apr 18, 2024
Figure 1 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 2 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 3 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Figure 4 for Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Viaarxiv icon

Asymptotics of Language Model Alignment

Add code
Apr 02, 2024
Figure 1 for Asymptotics of Language Model Alignment
Viaarxiv icon

Optimal Block-Level Draft Verification for Accelerating Speculative Decoding

Add code
Mar 15, 2024
Figure 1 for Optimal Block-Level Draft Verification for Accelerating Speculative Decoding
Figure 2 for Optimal Block-Level Draft Verification for Accelerating Speculative Decoding
Figure 3 for Optimal Block-Level Draft Verification for Accelerating Speculative Decoding
Figure 4 for Optimal Block-Level Draft Verification for Accelerating Speculative Decoding
Viaarxiv icon

Gradient-Based Language Model Red Teaming

Add code
Jan 30, 2024
Viaarxiv icon

Theoretical guarantees on the best-of-n alignment policy

Add code
Jan 03, 2024
Viaarxiv icon

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Add code
Dec 21, 2023
Viaarxiv icon