Picture for Adel Bibi

Adel Bibi

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Add code
Dec 29, 2025
Viaarxiv icon

Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning

Add code
Dec 10, 2025
Viaarxiv icon

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Add code
Aug 17, 2025
Viaarxiv icon

Attacking Multimodal OS Agents with Malicious Image Patches

Add code
Mar 13, 2025
Figure 1 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 2 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 3 for Attacking Multimodal OS Agents with Malicious Image Patches
Figure 4 for Attacking Multimodal OS Agents with Malicious Image Patches
Viaarxiv icon

Shh, don't say that! Domain Certification in LLMs

Add code
Feb 26, 2025
Viaarxiv icon

On the Coexistence and Ensembling of Watermarks

Add code
Jan 29, 2025
Figure 1 for On the Coexistence and Ensembling of Watermarks
Figure 2 for On the Coexistence and Ensembling of Watermarks
Figure 3 for On the Coexistence and Ensembling of Watermarks
Figure 4 for On the Coexistence and Ensembling of Watermarks
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts

Add code
Dec 13, 2024
Figure 1 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 2 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 3 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Figure 4 for Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Unanswerable Questions and Ambiguous Prompts
Viaarxiv icon

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Add code
Aug 27, 2024
Figure 1 for Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Figure 2 for Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Figure 3 for Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Figure 4 for Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Viaarxiv icon

FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging

Add code
Jul 11, 2024
Figure 1 for FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging
Figure 2 for FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging
Figure 3 for FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging
Figure 4 for FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging
Viaarxiv icon