Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Ruan

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

Mar 18, 2026

Zhihua Wei, Qiang Li, Jian Ruan, Zhenxin Qin, Leilei Wen, Dongrui Liu, Wen Shen

Abstract:Large vision-language models (VLMs) often exhibit weakened safety alignment with the integration of the visual modality. Even when text prompts contain explicit harmful intent, adding an image can substantially increase jailbreak success rates. In this paper, we observe that VLMs can clearly distinguish benign inputs from harmful ones in their representation space. Moreover, even among harmful inputs, jailbreak samples form a distinct internal state that is separable from refusal samples. These observations suggest that jailbreaks do not arise from a failure to recognize harmful intent. Instead, the visual modality shifts representations toward a specific jailbreak state, thereby leading to a failure to trigger refusal. To quantify this transition, we identify a jailbreak direction and define the jailbreak-related shift as the component of the image-induced representation shift along this direction. Our analysis shows that the jailbreak-related shift reliably characterizes jailbreak behavior, providing a unified explanation for diverse jailbreak scenarios. Finally, we propose a defense method that enhances VLM safety by removing the jailbreak-related shift (JRS-Rem) at inference time. Experiments show that JRS-Rem provides strong defense across multiple scenarios while preserving performance on benign tasks.

Via

Access Paper or Ask Questions

Texture Classification Network Integrating Adaptive Wavelet Transform

Apr 08, 2024

Su-Xi Yu, Jing-Yuan He, Yi Wang, Yu-Jiao Cai, Jun Yang, Bo Lin, Wei-Bin Yang, Jian Ruan

Figure 1 for Texture Classification Network Integrating Adaptive Wavelet Transform

Figure 2 for Texture Classification Network Integrating Adaptive Wavelet Transform

Figure 3 for Texture Classification Network Integrating Adaptive Wavelet Transform

Figure 4 for Texture Classification Network Integrating Adaptive Wavelet Transform

Abstract:Graves' disease is a common condition that is diagnosed clinically by determining the smoothness of the thyroid texture and its morphology in ultrasound images. Currently, the most widely used approach for the automated diagnosis of Graves' disease utilizes Convolutional Neural Networks (CNNs) for both feature extraction and classification. However, these methods demonstrate limited efficacy in capturing texture features. Given the high capacity of wavelets in describing texture features, this research integrates learnable wavelet modules utilizing the Lifting Scheme into CNNs and incorporates a parallel wavelet branch into the ResNet18 model to enhance texture feature extraction. Our model can analyze texture features in spatial and frequency domains simultaneously, leading to optimized classification accuracy. We conducted experiments on collected ultrasound datasets and publicly available natural image texture datasets, our proposed network achieved 97.27% accuracy and 95.60% recall on ultrasound datasets, 60.765% accuracy on natural image texture datasets, surpassing the accuracy of ResNet and conrming the effectiveness of our approach.

Via

Access Paper or Ask Questions