Picture for Michael Backes

Michael Backes

Generating Less Certain Adversarial Examples Improves Robust Generalization

Add code
Oct 06, 2023
Viaarxiv icon

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Add code
Aug 07, 2023
Figure 1 for "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Figure 2 for "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Figure 3 for "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Figure 4 for "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Viaarxiv icon

Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing

Add code
Aug 07, 2023
Figure 1 for Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Figure 2 for Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Figure 3 for Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Figure 4 for Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing
Viaarxiv icon

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

Add code
Jun 13, 2023
Viaarxiv icon

Generated Graph Detection

Add code
Jun 13, 2023
Viaarxiv icon

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

Add code
May 23, 2023
Figure 1 for Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Figure 2 for Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Figure 3 for Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Figure 4 for Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Viaarxiv icon

Two-in-One: A Model Hijacking Attack Against Text Generation Models

Add code
May 12, 2023
Figure 1 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 2 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 3 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Figure 4 for Two-in-One: A Model Hijacking Attack Against Text Generation Models
Viaarxiv icon

In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT

Add code
Apr 18, 2023
Figure 1 for In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Figure 2 for In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Figure 3 for In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Figure 4 for In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
Viaarxiv icon

FACE-AUDITOR: Data Auditing in Facial Recognition Systems

Add code
Apr 05, 2023
Figure 1 for FACE-AUDITOR: Data Auditing in Facial Recognition Systems
Figure 2 for FACE-AUDITOR: Data Auditing in Facial Recognition Systems
Figure 3 for FACE-AUDITOR: Data Auditing in Facial Recognition Systems
Figure 4 for FACE-AUDITOR: Data Auditing in Facial Recognition Systems
Viaarxiv icon

MGTBench: Benchmarking Machine-Generated Text Detection

Add code
Mar 26, 2023
Figure 1 for MGTBench: Benchmarking Machine-Generated Text Detection
Figure 2 for MGTBench: Benchmarking Machine-Generated Text Detection
Figure 3 for MGTBench: Benchmarking Machine-Generated Text Detection
Figure 4 for MGTBench: Benchmarking Machine-Generated Text Detection
Viaarxiv icon