Picture for Nouha Dziri

Nouha Dziri

Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

Add code
Jul 10, 2024
Viaarxiv icon

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Add code
Jun 26, 2024
Viaarxiv icon

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Add code
Jun 26, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Viaarxiv icon

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

Add code
Apr 16, 2024
Figure 1 for CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Figure 2 for CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Figure 3 for CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Figure 4 for CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Viaarxiv icon

RewardBench: Evaluating Reward Models for Language Modeling

Add code
Mar 20, 2024
Figure 1 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 2 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 3 for RewardBench: Evaluating Reward Models for Language Modeling
Figure 4 for RewardBench: Evaluating Reward Models for Language Modeling
Viaarxiv icon

A Roadmap to Pluralistic Alignment

Add code
Feb 07, 2024
Viaarxiv icon

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Add code
Dec 04, 2023
Figure 1 for The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Figure 2 for The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Figure 3 for The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Figure 4 for The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Viaarxiv icon

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Add code
Nov 01, 2023
Figure 1 for What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Figure 2 for What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Figure 3 for What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Figure 4 for What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
Viaarxiv icon

The Generative AI Paradox: "What It Can Create, It May Not Understand"

Add code
Oct 31, 2023
Figure 1 for The Generative AI Paradox: "What It Can Create, It May Not Understand"
Figure 2 for The Generative AI Paradox: "What It Can Create, It May Not Understand"
Figure 3 for The Generative AI Paradox: "What It Can Create, It May Not Understand"
Figure 4 for The Generative AI Paradox: "What It Can Create, It May Not Understand"
Viaarxiv icon