Alert button
Picture for Wenxuan Wang

Wenxuan Wang

Alert button

Boosting Adversarial Transferability by Block Shuffle and Rotation

Aug 22, 2023
Kunyu Wang, Xuanran He, Wenxuan Wang, Xiaosen Wang

Figure 1 for Boosting Adversarial Transferability by Block Shuffle and Rotation
Figure 2 for Boosting Adversarial Transferability by Block Shuffle and Rotation
Figure 3 for Boosting Adversarial Transferability by Block Shuffle and Rotation
Figure 4 for Boosting Adversarial Transferability by Block Shuffle and Rotation

Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability, which refers to their ability to deceive other models, thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability, the performance still falls short compared with white-box attacks. In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding, we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically, BSR splits the input image into several blocks, then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability, which significantly outperforms the state-of-the-art methods.

Viaarxiv icon

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Aug 22, 2023
Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu

Figure 1 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 2 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 3 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation
Figure 4 for EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Referring image segmentation aims to segment an object mentioned in natural language from an image. A main challenge is language-related localization, which means locating the object with the relevant language. Previous approaches mainly focus on the fusion of vision and language features without fully addressing language-related localization. In previous approaches, fused vision-language features are directly fed into a decoder and pass through a convolution with a fixed kernel to obtain the result, which follows a similar pattern as traditional image segmentation. This approach does not explicitly align language and vision features in the segmentation stage, resulting in a suboptimal language-related localization. Different from previous methods, we propose Explicitly Align the Vision and Language for Referring Image Segmentation (EAVL). Instead of using a fixed convolution kernel, we propose an Aligner which explicitly aligns the vision and language features in the segmentation stage. Specifically, a series of unfixed convolution kernels are generated based on the input l, and then are use to explicitly align the vision and language features. To achieve this, We generate multiple queries that represent different emphases of the language expression. These queries are transformed into a series of query-based convolution kernels. Then, we utilize these kernels to do convolutions in the segmentation stage and obtain a series of segmentation masks. The final result is obtained through the aggregation of all masks. Our method can not only fuse vision and language features effectively but also exploit their potential in the segmentation stage. And most importantly, we explicitly align language features of different emphases with the image features to achieve language-related localization. Our method surpasses previous state-of-the-art methods on RefCOCO, RefCOCO+, and G-Ref by large margins.

* 10 pages, 4 figures. arXiv admin note: text overlap with arXiv:2305.14969 
Viaarxiv icon

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Aug 18, 2023
Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, Michael R. Lyu

Figure 1 for An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Figure 2 for An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Figure 3 for An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software
Figure 4 for An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000 real-world toxic contents collected from 4 popular social media applications, including Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents, OASIS can generate image test cases, which preserve the toxicity yet are likely to bypass moderation. In the evaluation, we employ OASIS to test five commercial textual content moderation software from famous companies (i.e., Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud), as well as a state-of-the-art moderation research model. The results show that OASIS achieves up to 100% error finding rates. Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation.

* Accepted by ASE 2023. arXiv admin note: substantial text overlap with arXiv:2302.05706 
Viaarxiv icon

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Aug 12, 2023
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu

Figure 1 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 2 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 3 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Figure 4 for GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Safety lies at the core of the development of Large Language Models (LLMs). There is ample work on aligning LLMs with human ethics and preferences, including data filtering in pretraining, supervised fine-tuning, reinforcement learning from human feedback, and red teaming, etc. In this study, we discover that chat in cipher can bypass the safety alignment techniques of LLMs, which are mainly conducted in natural languages. We propose a novel framework CipherChat to systematically examine the generalizability of safety alignment to non-natural languages -- ciphers. CipherChat enables humans to chat with LLMs through cipher prompts topped with system role descriptions and few-shot enciphered demonstrations. We use CipherChat to assess state-of-the-art LLMs, including ChatGPT and GPT-4 for different representative human ciphers across 11 safety domains in both English and Chinese. Experimental results show that certain ciphers succeed almost 100% of the time to bypass the safety alignment of GPT-4 in several safety domains, demonstrating the necessity of developing safety alignment for non-natural languages. Notably, we identify that LLMs seem to have a ''secret cipher'', and propose a novel SelfCipher that uses only role play and several demonstrations in natural language to evoke this capability. SelfCipher surprisingly outperforms existing human ciphers in almost all cases. Our code and data will be released at https://github.com/RobustNLP/CipherChat.

* 13 pages, 4 figures, 9 tables 
Viaarxiv icon

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

Aug 07, 2023
Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

Figure 1 for Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
Figure 2 for Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
Figure 3 for Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench
Figure 4 for Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

Recently, the community has witnessed the advancement of Large Language Models (LLMs), which have shown remarkable performance on various downstream tasks. Led by powerful models like ChatGPT and Claude, LLMs are revolutionizing how users engage with software, assuming more than mere tools but intelligent assistants. Consequently, evaluating LLMs' anthropomorphic capabilities becomes increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. Categorizing the situations into 36 factors, we conduct a human evaluation involving more than 1,200 subjects worldwide. With the human evaluation results as references, our evaluation includes five LLMs, covering both commercial and open-source models, including variations in model sizes, featuring the latest iterations, such as GPT-4 and LLaMA 2. A conclusion can be drawn from the results that, despite several misalignments, LLMs can generally respond appropriately to certain situations. Nevertheless, they fall short in alignment with the emotional behaviors of human beings and cannot establish connections between similar situations. Our collected dataset of situations, the human evaluation results, and the code of our testing framework, dubbed EmotionBench, is made publicly in https://github.com/CUHK-ARISE/EmotionBench. We aspire to contribute to the advancement of LLMs regarding better alignment with the emotional behaviors of human beings, thereby enhancing their utility and applicability as intelligent assistants.

* 17 pages 
Viaarxiv icon

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Jul 14, 2023
Wenxuan Wang, Guodong Ma, Yuke Li, Binbin Du

Figure 1 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 2 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 3 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Figure 4 for Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline with comparable computational efficiency.

* To appear in Proc. INTERSPEECH 2023, August 20-24, 2023, Dublin, Ireland 
Viaarxiv icon

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models

Jun 07, 2023
Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao, Michael R. Lyu

Figure 1 for ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models
Figure 2 for ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models
Figure 3 for ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models
Figure 4 for ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models

Large Language Models (LLMs) have made remarkable advancements in the field of artificial intelligence, significantly reshaping the human-computer interaction. We not only focus on the performance of LLMs, but also explore their features from a psychological perspective, acknowledging the importance of understanding their behavioral characteristics. Our study examines the behavioral patterns displayed by LLMs by employing trait theory, a psychological framework. We first focus on evaluating the consistency of personality types exhibited by ChatGPT. Furthermore, experiments include cross-lingual effects on seven additional languages, and the investigation of six other LLMs. Moreover, the study investigates whether ChatGPT can exhibit personality changes in response to instructions or contextual cues. The findings show that ChatGPT consistently maintains its ENFJ personality regardless of instructions or contexts. By shedding light on the personalization of LLMs, we anticipate that our study will serve as a catalyst for further research in this field.

* Added robustness analysis against fine-tuning (results of text-davinci-003); Added results of ChatGLM; Added limitations 
Viaarxiv icon

Validating Multimedia Content Moderation Software via Semantic Fusion

May 23, 2023
Wenxuan Wang, Jingyuan Huang, Chang Chen, Jiazhen Gu, Jianping Zhang, Weibin Wu, Pinjia He, Michael Lyu

Figure 1 for Validating Multimedia Content Moderation Software via Semantic Fusion
Figure 2 for Validating Multimedia Content Moderation Software via Semantic Fusion
Figure 3 for Validating Multimedia Content Moderation Software via Semantic Fusion
Figure 4 for Validating Multimedia Content Moderation Software via Semantic Fusion

The exponential growth of social media platforms, such as Facebook and TikTok, has revolutionized communication and content publication in human society. Users on these platforms can publish multimedia content that delivers information via the combination of text, audio, images, and video. Meanwhile, the multimedia content release facility has been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisements, and pornography. To this end, content moderation software has been widely deployed on these platforms to detect and blocks toxic content. However, due to the complexity of content moderation models and the difficulty of understanding information across multiple modalities, existing content moderation software can fail to detect toxic content, which often leads to extremely negative impacts. We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software. Our key idea is to fuse two or more existing single-modal inputs (e.g., a textual sentence and an image) into a new input that combines the semantics of its ancestors in a novel manner and has toxic nature by construction. This fused input is then used for validating multimedia content moderation software. We realized Semantic Fusion as DUO, a practical content moderation software testing tool. In our evaluation, we employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content. The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software. In addition, we leverage the test cases generated by DUO to retrain the two models we explored, which largely improves model robustness while maintaining the accuracy on the original test set.

* Accepted by ISSTA 2023 
Viaarxiv icon

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

May 22, 2023
Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li

Figure 1 for CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
Figure 2 for CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
Figure 3 for CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
Figure 4 for CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression. Due to the essentially distinct data properties between image and text, most of existing methods either introduce complex designs towards fine-grained vision-language alignment or lack required dense alignment, resulting in scalability issues or mis-segmentation problems such as over- or under-segmentation. To achieve effective and efficient fine-grained feature alignment in the RIS task, we explore the potential of masked multimodal modeling coupled with self-distillation and propose a novel cross-modality masked self-distillation framework named CM-MaskSD, in which our method inherits the transferred knowledge of image-text semantic alignment from CLIP model to realize fine-grained patch-word feature alignment for better segmentation accuracy. Moreover, our CM-MaskSD framework can considerably boost model performance in a nearly parameter-free manner, since it shares weights between the main segmentation branch and the introduced masked self-distillation branches, and solely introduces negligible parameters for coordinating the multimodal features. Comprehensive experiments on three benchmark datasets (i.e. RefCOCO, RefCOCO+, G-Ref) for the RIS task convincingly demonstrate the superiority of our proposed framework over previous state-of-the-art methods.

Viaarxiv icon