Picture for Adel Bibi

Adel Bibi

A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode

Add code
Jan 30, 2026
Viaarxiv icon

The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

Add code
Jan 30, 2026
Viaarxiv icon

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Add code
Dec 29, 2025
Viaarxiv icon

Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning

Add code
Dec 10, 2025
Viaarxiv icon

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Add code
Aug 17, 2025
Viaarxiv icon

Attacking Multimodal OS Agents with Malicious Image Patches

Add code
Mar 13, 2025
Figure 1 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 2 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 3 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 4 for Attacking Multimodal OS Agents with Malicious Image Patches
Viaarxiv icon

Shh, don't say that! Domain Certification in LLMs

Add code
Feb 26, 2025
Viaarxiv icon

On the Coexistence and Ensembling of Watermarks

Add code
Jan 29, 2025
Figure 1 for On the Coexistence and Ensembling of Watermarks
Figure 2 for On the Coexistence and Ensembling of Watermarks
Figure 3 for On the Coexistence and Ensembling of Watermarks
Figure 4 for On the Coexistence and Ensembling of Watermarks
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts

Add code
Dec 13, 2024
Figure 1 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 2 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 3 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 4 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Viaarxiv icon