Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinfeng Bai

Unveiling the Implicit Toxicity in Large Language Models

Nov 29, 2023

Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang

Abstract:The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. Moreover, we propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs. Specifically, we optimize the language model with a reward that prefers implicit toxic outputs to explicit toxic and non-toxic ones. Experiments on five widely-adopted toxicity classifiers demonstrate that the attack success rate can be significantly improved through RL fine-tuning. For instance, the RL-finetuned LLaMA-13B model achieves an attack success rate of 90.04% on BAD and 62.85% on Davinci003. Our findings suggest that LLMs pose a significant threat in generating undetectable implicit toxic outputs. We further show that fine-tuning toxicity classifiers on the annotated examples from our attacking method can effectively enhance their ability to detect LLM-generated implicit toxic language. The code is publicly available at https://github.com/thu-coai/Implicit-Toxicity.

* EMNLP 2023 Main Conference

Via

Access Paper or Ask Questions

GPT Can Solve Mathematical Problems Without a Calculator

Sep 12, 2023

Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang

Figure 1 for GPT Can Solve Mathematical Problems Without a Calculator

Figure 2 for GPT Can Solve Mathematical Problems Without a Calculator

Figure 3 for GPT Can Solve Mathematical Problems Without a Calculator

Figure 4 for GPT Can Solve Mathematical Problems Without a Calculator

Abstract:Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set. Our code and data are public at https://github.com/THUDM/MathGLM.

* 26pages,14figures

Via

Access Paper or Ask Questions

Patch Is Not All You Need

Aug 21, 2023

Changzhen Li, Jie Zhang, Yang Wei, Zhilong Ji, Jinfeng Bai, Shiguang Shan

Abstract:Vision Transformers have achieved great success in computer visions, delivering exceptional performance across various tasks. However, their inherent reliance on sequential input enforces the manual partitioning of images into patch sequences, which disrupts the image's inherent structural and semantic continuity. To handle this, we propose a novel Pattern Transformer (Patternformer) to adaptively convert images to pattern sequences for Transformer input. Specifically, we employ the Convolutional Neural Network to extract various patterns from the input image, with each channel representing a unique pattern that is fed into the succeeding Transformer as a visual token. By enabling the network to optimize these patterns, each pattern concentrates on its local region of interest, thereby preserving its intrinsic structural and semantic information. Only employing the vanilla ResNet and Transformer, we have accomplished state-of-the-art performance on CIFAR-10 and CIFAR-100, and have achieved competitive results on ImageNet.

Via

Access Paper or Ask Questions

DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

Jun 02, 2023

Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai

Abstract:Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application. In this paper, we propose DistilXLSR, a distilled cross-lingual speech representation model. By randomly shuffling the phonemes of existing speech, we reduce the linguistic information and distill cross-lingual models using only English data. We also design a layer-jumping initialization method to fully leverage the teacher's pre-trained weights. Experiments on 2 kinds of teacher models and 15 low-resource languages show that our method can reduce the parameters by 50% while maintaining cross-lingual representation ability. Our method is proven to be generalizable to various languages/teacher models and has the potential to improve the cross-lingual performance of the English pre-trained models.

* Accepted by INTERSPEECH 2023

Via

Access Paper or Ask Questions

Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

May 31, 2023

Yuxiang Wei, Zhilong Ji, Xiaohe Wu, Jinfeng Bai, Lei Zhang, Wangmeng Zuo

Figure 1 for Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

Figure 2 for Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

Figure 3 for Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

Figure 4 for Inferring and Leveraging Parts from Object Shape for Improving Semantic Image Synthesis

Abstract:Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map. Integrating part segmentation map can undoubtedly benefit image synthesis, but is bothersome and inconvenient to be provided by users. To improve part synthesis, this paper presents to infer Parts from Object ShapE (iPOSE) and leverage it for improving semantic image synthesis. However, albeit several part segmentation datasets are available, part annotations are still not provided for many object categories in semantic image synthesis. To circumvent it, we resort to few-shot regime to learn a PartNet for predicting the object part map with the guidance of pre-defined support part maps. PartNet can be readily generalized to handle a new object category when a small number (e.g., 3) of support part maps for this category are provided. Furthermore, part semantic modulation is presented to incorporate both inferred part map and semantic map for image synthesis. Experiments show that our iPOSE not only generates objects with rich part details, but also enables to control the image synthesis flexibly. And our iPOSE performs favorably against the state-of-the-art methods in terms of quantitative and qualitative evaluation. Our code will be publicly available at https://github.com/csyxwei/iPOSE.

* CVPR 2023. Code will be released at https://github.com/csyxwei/iPOSE

Via

Access Paper or Ask Questions

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

May 09, 2023

Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang

Abstract:Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.

* Accepted by IJCAI 2023

Via

Access Paper or Ask Questions

CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Apr 22, 2023

Zhongqi Wang, Jie Zhang, Zhilong Ji, Jinfeng Bai, Shiguang Shan

Figure 1 for CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Figure 2 for CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Figure 3 for CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Figure 4 for CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

Abstract:With the development of deep generative models, recent years have seen great success of Chinese landscape painting generation. However, few works focus on controllable Chinese landscape painting generation due to the lack of data and limited modeling capabilities. In this work, we propose a controllable Chinese landscape painting generation method named CCLAP, which can generate painting with specific content and style based on Latent Diffusion Model. Specifically, it consists of two cascaded modules, i.e., content generator and style aggregator. The content generator module guarantees the content of generated paintings specific to the input text. While the style aggregator module is to generate paintings of a style corresponding to a reference image. Moreover, a new dataset of Chinese landscape paintings named CLAP is collected for comprehensive evaluation. Both the qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance, especially in artfully-composed and artistic conception. Codes are available at https://github.com/Robin-WZQ/CCLAP.

* 8 pages,13 figures

Via

Access Paper or Ask Questions

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Feb 27, 2023

Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, Wangmeng Zuo

Figure 1 for ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Figure 2 for ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Figure 3 for ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Figure 4 for ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Abstract:Despite unprecedented ability in imaginary creation, large text-to-image models are further expected to express customized concepts. Existing works generally learn such concepts in an optimization-based manner, yet bringing excessive computation or memory burden. In this paper, we instead propose a learning-based encoder for fast and accurate concept customization, which consists of global and local mapping networks. In specific, the global mapping network separately projects the hierarchical features of a given image into multiple ``new'' words in the textual word embedding space, i.e., one primary word for well-editable concept and other auxiliary words to exclude irrelevant disturbances (e.g., background). In the meantime, a local mapping network injects the encoded patch features into cross attention layers to provide omitted details, without sacrificing the editability of primary concepts. We compare our method with prior optimization-based approaches on a variety of user-defined concepts, and demonstrate that our method enables more high-fidelity inversion and robust editability with a significantly faster encoding process. Our code will be publicly available at https://github.com/csyxwei/ELITE.

Via

Access Paper or Ask Questions

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

Dec 27, 2022

Zhiwei Hu, Bo Chen, Yuan Gao, Zhilong Ji, Jinfeng Bai

Figure 1 for 1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

Figure 2 for 1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

Figure 3 for 1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

Abstract:The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set. After ensemble, we achieve 64.1 on the final leaderboard, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge. Code will be available at https://github.com/Zhiweihhh/cvpr2022-rvos-challenge.git.

* 4 pages, 2 figures

Via

Access Paper or Ask Questions

Position-Aware Contrastive Alignment for Referring Image Segmentation

Dec 27, 2022

Bo Chen, Zhiwei Hu, Zhilong Ji, Jinfeng Bai, Wangmeng Zuo

Figure 1 for Position-Aware Contrastive Alignment for Referring Image Segmentation

Figure 2 for Position-Aware Contrastive Alignment for Referring Image Segmentation

Figure 3 for Position-Aware Contrastive Alignment for Referring Image Segmentation

Figure 4 for Position-Aware Contrastive Alignment for Referring Image Segmentation

Abstract:Referring image segmentation aims to segment the target object described by a given natural language expression. Typically, referring expressions contain complex relationships between the target and its surrounding objects. The main challenge of this task is to understand the visual and linguistic content simultaneously and to find the referred object accurately among all instances in the image. Currently, the most effective way to solve the above problem is to obtain aligned multi-modal features by computing the correlation between visual and linguistic feature modalities under the supervision of the ground-truth mask. However, existing paradigms have difficulty in thoroughly understanding visual and linguistic content due to the inability to perceive information directly about surrounding objects that refer to the target. This prevents them from learning aligned multi-modal features, which leads to inaccurate segmentation. To address this issue, we present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features by guiding the interaction between vision and language through prior position information. Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment by comparing the features of the referred object with those of related objects. Extensive experiments on three benchmarks demonstrate our PCAN performs favorably against the state-of-the-art methods. Our code will be made publicly available.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions