Picture for Dmitriy Bespalov

Dmitriy Bespalov

TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice

Add code
Feb 21, 2025
Figure 1 for TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Figure 2 for TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Figure 3 for TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Figure 4 for TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Viaarxiv icon

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Add code
Jan 28, 2025
Viaarxiv icon

TaeBench: Improving Quality of Toxic Adversarial Examples

Add code
Oct 08, 2024
Viaarxiv icon

Towards Building a Robust Toxicity Predictor

Add code
Apr 09, 2024
Figure 1 for Towards Building a Robust Toxicity Predictor
Figure 2 for Towards Building a Robust Toxicity Predictor
Figure 3 for Towards Building a Robust Toxicity Predictor
Figure 4 for Towards Building a Robust Toxicity Predictor
Viaarxiv icon

Latent Skill Discovery for Chain-of-Thought Reasoning

Add code
Dec 07, 2023
Figure 1 for Latent Skill Discovery for Chain-of-Thought Reasoning
Figure 2 for Latent Skill Discovery for Chain-of-Thought Reasoning
Figure 3 for Latent Skill Discovery for Chain-of-Thought Reasoning
Figure 4 for Latent Skill Discovery for Chain-of-Thought Reasoning
Viaarxiv icon