Picture for Ninareh Mehrabi

Ninareh Mehrabi

Prompt Perturbation Consistency Learning for Robust Language Models

Add code
Feb 24, 2024
Viaarxiv icon

Are you talking to or ? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

Add code
Dec 21, 2023
Viaarxiv icon

JAB: Joint Adversarial Prompting and Belief Augmentation

Add code
Nov 16, 2023
Viaarxiv icon

On the steerability of large language models toward data-driven personas

Add code
Nov 08, 2023
Figure 1 for On the steerability of large language models toward data-driven personas
Figure 2 for On the steerability of large language models toward data-driven personas
Figure 3 for On the steerability of large language models toward data-driven personas
Figure 4 for On the steerability of large language models toward data-driven personas
Viaarxiv icon

FLIRT: Feedback Loop In-context Red Teaming

Add code
Aug 08, 2023
Figure 1 for FLIRT: Feedback Loop In-context Red Teaming
Figure 2 for FLIRT: Feedback Loop In-context Red Teaming
Figure 3 for FLIRT: Feedback Loop In-context Red Teaming
Figure 4 for FLIRT: Feedback Loop In-context Red Teaming
Viaarxiv icon

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Add code
Nov 17, 2022
Figure 1 for Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Figure 2 for Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Figure 3 for Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Figure 4 for Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Viaarxiv icon

Robust Conversational Agents against Imperceptible Toxicity Triggers

Add code
May 05, 2022
Figure 1 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 2 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 3 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Figure 4 for Robust Conversational Agents against Imperceptible Toxicity Triggers
Viaarxiv icon

Towards Multi-Objective Statistically Fair Federated Learning

Add code
Jan 24, 2022
Figure 1 for Towards Multi-Objective Statistically Fair Federated Learning
Figure 2 for Towards Multi-Objective Statistically Fair Federated Learning
Figure 3 for Towards Multi-Objective Statistically Fair Federated Learning
Figure 4 for Towards Multi-Objective Statistically Fair Federated Learning
Viaarxiv icon

Attributing Fair Decisions with Attention Interventions

Add code
Sep 08, 2021
Figure 1 for Attributing Fair Decisions with Attention Interventions
Figure 2 for Attributing Fair Decisions with Attention Interventions
Figure 3 for Attributing Fair Decisions with Attention Interventions
Figure 4 for Attributing Fair Decisions with Attention Interventions
Viaarxiv icon

Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources

Add code
Mar 21, 2021
Figure 1 for Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Figure 2 for Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Figure 3 for Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Figure 4 for Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Viaarxiv icon