Picture for Gelei Deng

Gelei Deng

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

Add code
Jun 09, 2025
Viaarxiv icon

Holmes: Automated Fact Check with Large Language Models

Add code
May 06, 2025
Viaarxiv icon

A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories

Add code
May 02, 2025
Viaarxiv icon

Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning

Add code
Jan 31, 2025
Figure 1 for Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
Figure 2 for Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
Figure 3 for Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
Figure 4 for Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning
Viaarxiv icon

Indiana Jones: There Are Always Some Useful Ancient Relics

Add code
Jan 27, 2025
Figure 1 for Indiana Jones: There Are Always Some Useful Ancient Relics
Figure 2 for Indiana Jones: There Are Always Some Useful Ancient Relics
Figure 3 for Indiana Jones: There Are Always Some Useful Ancient Relics
Figure 4 for Indiana Jones: There Are Always Some Useful Ancient Relics
Viaarxiv icon

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

Add code
Nov 19, 2024
Figure 1 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 2 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 3 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Figure 4 for Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Viaarxiv icon

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

Add code
Oct 18, 2024
Figure 1 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 2 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 3 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Figure 4 for Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Viaarxiv icon

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Add code
Aug 22, 2024
Viaarxiv icon

Efficient Detection of Toxic Prompts in Large Language Models

Add code
Aug 21, 2024
Figure 1 for Efficient Detection of Toxic Prompts in Large Language Models
Figure 2 for Efficient Detection of Toxic Prompts in Large Language Models
Figure 3 for Efficient Detection of Toxic Prompts in Large Language Models
Figure 4 for Efficient Detection of Toxic Prompts in Large Language Models
Viaarxiv icon

Image-Based Geolocation Using Large Vision-Language Models

Add code
Aug 18, 2024
Viaarxiv icon