Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Piji Li

DECIDER: A Rule-Controllable Decoding Strategy for Language Generation by Imitating Dual-System Cognitive Theory

Mar 04, 2024

Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi(+1 more)

Abstract:Lexicon-based constrained decoding approaches aim to control the meaning or style of the generated text through certain target concepts. Existing approaches over-focus the targets themselves, leading to a lack of high-level reasoning about how to achieve them. However, human usually tackles tasks by following certain rules that not only focuses on the targets but also on semantically relevant concepts that induce the occurrence of targets. In this work, we present DECIDER, a rule-controllable decoding strategy for constrained language generation inspired by dual-system cognitive theory. Specifically, in DECIDER, a pre-trained language model (PLM) is equiped with a logic reasoner that takes high-level rules as input. Then, the DECIDER allows rule signals to flow into the PLM at each decoding step. Extensive experimental results demonstrate that DECIDER can effectively follow given rules to guide generation direction toward the targets in a more human-like manner.

* Submitted to IEEE TKDE, 12 pages, 6 figures

Via

Access Paper or Ask Questions

An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

Jan 26, 2024

Xi Wang, Ruoqing Zhao, Hongliang Dai, Piji Li

Figure 1 for An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

Figure 2 for An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

Figure 3 for An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

Figure 4 for An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

Abstract:Chinese Spelling Check (CSC) is a meaningful task in the area of Natural Language Processing (NLP) which aims at detecting spelling errors in Chinese texts and then correcting these errors. However, CSC models are based on pretrained language models, which are trained on a general corpus. Consequently, their performance may drop when confronted with downstream tasks involving domain-specific terms. In this paper, we conduct a thorough evaluation about the domain adaption ability of various typical CSC models by building three new datasets encompassing rich domain-specific terms from the financial, medical, and legal domains. Then we conduct empirical investigations in the corresponding domain-specific test datasets to ascertain the cross-domain adaptation ability of several typical CSC models. We also test the performance of the popular large language model ChatGPT. As shown in our experiments, the performances of the CSC models drop significantly in the new domains.

* ICASSP2024

Via

Access Paper or Ask Questions

Punctuation Matters! Stealthy Backdoor Attack for Language Models

Dec 26, 2023

Xuan Sheng, Zhicheng Li, Zhaoyang Han, Xiangmao Chang, Piji Li

Abstract:Recent studies have pointed out that natural language processing (NLP) models are vulnerable to backdoor attacks. A backdoored model produces normal outputs on the clean samples while performing improperly on the texts with triggers that the adversary injects. However, previous studies on textual backdoor attack pay little attention to stealthiness. Moreover, some attack methods even cause grammatical issues or change the semantic meaning of the original texts. Therefore, they can easily be detected by humans or defense systems. In this paper, we propose a novel stealthy backdoor attack method against textual models, which is called \textbf{PuncAttack}. It leverages combinations of punctuation marks as the trigger and chooses proper locations strategically to replace them. Through extensive experiments, we demonstrate that the proposed method can effectively compromise multiple models in various tasks. Meanwhile, we conduct automatic evaluation and human inspection, which indicate the proposed method possesses good performance of stealthiness without bringing grammatical issues and altering the meaning of sentences.

* NLPCC 2023

Via

Access Paper or Ask Questions

Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning

Dec 26, 2023

Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li

Abstract:Automated radiology report generation has the potential to improve radiology reporting and alleviate the workload of radiologists. However, the medical report generation task poses unique challenges due to the limited availability of medical data and the presence of data bias. To maximize the utility of available data and reduce data bias, we propose MSCL (Medical image Segmentation with Contrastive Learning), a framework that utilizes the Segment Anything Model (SAM) to segment organs, abnormalities, bones, etc., and can pay more attention to the meaningful ROIs in the image to get better visual representations. Then we introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training. The design of this loss function aims to mitigate the impact of data bias and encourage the model to capture the essential features of a medical image and generate high-quality reports. Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.

* NLPCC 2023

Via

Access Paper or Ask Questions

Data Contamination Issues in Brain-to-Text Decoding

Dec 26, 2023

Congchi Yin, Qian Yu, Zhiwei Fang, Jie He, Changping Peng, Zhangang Lin, Jingping Shao, Piji Li

Figure 1 for Data Contamination Issues in Brain-to-Text Decoding

Figure 2 for Data Contamination Issues in Brain-to-Text Decoding

Figure 3 for Data Contamination Issues in Brain-to-Text Decoding

Figure 4 for Data Contamination Issues in Brain-to-Text Decoding

Abstract:Decoding non-invasive cognitive signals to natural language has long been the goal of building practical brain-computer interfaces (BCIs). Recent major milestones have successfully decoded cognitive signals like functional Magnetic Resonance Imaging (fMRI) and electroencephalogram (EEG) into text under open vocabulary setting. However, how to split the datasets for training, validating, and testing in cognitive signal decoding task still remains controversial. In this paper, we conduct systematic analysis on current dataset splitting methods and find the existence of data contamination largely exaggerates model performance. Specifically, first we find the leakage of test subjects' cognitive signals corrupts the training of a robust encoder. Second, we prove the leakage of text stimuli causes the auto-regressive decoder to memorize information in test set. The decoder generates highly accurate text not because it truly understands cognitive signals. To eliminate the influence of data contamination and fairly evaluate different models' generalization ability, we propose a new splitting method for different types of cognitive datasets (e.g. fMRI, EEG). We also test the performance of SOTA Brain-to-Text decoding models under the proposed dataset splitting paradigm as baselines for further research.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

InfoDiffusion: Information Entropy Aware Diffusion Process for Non-Autoregressive Text Generation

Oct 18, 2023

Renzhi Wang, Jing Li, Piji Li

Figure 1 for InfoDiffusion: Information Entropy Aware Diffusion Process for Non-Autoregressive Text Generation

Figure 2 for InfoDiffusion: Information Entropy Aware Diffusion Process for Non-Autoregressive Text Generation

Figure 3 for InfoDiffusion: Information Entropy Aware Diffusion Process for Non-Autoregressive Text Generation

Figure 4 for InfoDiffusion: Information Entropy Aware Diffusion Process for Non-Autoregressive Text Generation

Abstract:Diffusion models have garnered considerable interest in the field of text generation. Several studies have explored text diffusion models with different structures and applied them to various tasks, including named entity recognition and summarization. However, there exists a notable disparity between the "easy-first" text generation process of current diffusion models and the "keyword-first" natural text generation process of humans, which has received limited attention. To bridge this gap, we propose InfoDiffusion, a non-autoregressive text diffusion model. Our approach introduces a "keyinfo-first" generation strategy and incorporates a noise schedule based on the amount of text information. In addition, InfoDiffusion combines self-conditioning with a newly proposed partially noising model structure. Experimental results show that InfoDiffusion outperforms the baseline model in terms of generation quality and diversity, as well as exhibiting higher sampling efficiency.

* EMNLP 2023 Findings

Via

Access Paper or Ask Questions

Topic-Guided Self-Introduction Generation for Social Media Users

May 24, 2023

Chunpu Xu, Jing Li, Piji Li, Min Yang

Figure 1 for Topic-Guided Self-Introduction Generation for Social Media Users

Figure 2 for Topic-Guided Self-Introduction Generation for Social Media Users

Figure 3 for Topic-Guided Self-Introduction Generation for Social Media Users

Figure 4 for Topic-Guided Self-Introduction Generation for Social Media Users

Abstract:Millions of users are active on social media. To allow users to better showcase themselves and network with others, we explore the auto-generation of social media self-introduction, a short sentence outlining a user's personal interests. While most prior work profiles users with tags (e.g., ages), we investigate sentence-level self-introductions to provide a more natural and engaging way for users to know each other. Here we exploit a user's tweeting history to generate their self-introduction. The task is non-trivial because the history content may be lengthy, noisy, and exhibit various personal interests. To address this challenge, we propose a novel unified topic-guided encoder-decoder (UTGED) framework; it models latent topics to reflect salient user interest, whose topic mixture then guides encoding a user's history and topic words control decoding their self-introduction. For experiments, we collect a large-scale Twitter dataset, and extensive results show the superiority of our UTGED to the advanced encoder-decoder models without topic modeling.

Via

Access Paper or Ask Questions

Unified Text Structuralization with Instruction-tuned Language Models

Mar 30, 2023

Xuanfan Ni, Piji Li, Huayang Li

Abstract:Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of manually annotated high-quality datasets from different domains and languages, which require specialized professional knowledge. In addition, most IE methods are designed for a specific type of structured data, e.g., entities, relations, and events, making them hard to generalize to others. In this work, we propose a simple and efficient approach to instruct large language model (LLM) to extract a variety of structures from texts. More concretely, we add a prefix and a suffix instruction to indicate the desired IE task and structure type, respectively, before feeding the text into a LLM. Experiments on two LLMs show that this approach can enable language models to perform comparable with other state-of-the-art methods on datasets of a variety of languages and knowledge, and can generalize to other IE sub-tasks via changing the content of instruction. Another benefit of our approach is that it can help researchers to build datasets in low-source and domain-specific scenarios, e.g., fields in finance and law, with low cost.

* 13 pages, 5 figures

Via

Access Paper or Ask Questions

Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision

Mar 06, 2023

Shuo Feng, Piji Li

Abstract:Ancient Chinese word segmentation (WSG) and part-of-speech tagging (POS) are important to study ancient Chinese, but the amount of ancient Chinese WSG and POS tagging data is still rare. In this paper, we propose a novel augmentation method of ancient Chinese WSG and POS tagging data using distant supervision over parallel corpus. However, there are still mislabeled and unlabeled ancient Chinese words inevitably in distant supervision. To address this problem, we take advantage of the memorization effects of deep neural networks and a small amount of annotated data to get a model with much knowledge and a little noise, and then we use this model to relabel the ancient Chinese sentences in parallel corpus. Experiments show that the model trained over the relabeled data outperforms the model trained over the data generated from distant supervision and the annotated data. Our code is available at https://github.com/farlit/ACDS.

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions

CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Mar 02, 2023

Congchi Yin, Piji Li, Zhaochun Ren

Figure 1 for CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Figure 2 for CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Figure 3 for CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Figure 4 for CTRLStruct: Dialogue Structure Learning for Open-Domain Response Generation

Abstract:Dialogue structure discovery is essential in dialogue generation. Well-structured topic flow can leverage background information and predict future topics to help generate controllable and explainable responses. However, most previous work focused on dialogue structure learning in task-oriented dialogue other than open-domain dialogue which is more complicated and challenging. In this paper, we present a new framework CTRLStruct for dialogue structure learning to effectively explore topic-level dialogue clusters as well as their transitions with unlabelled information. Precisely, dialogue utterances encoded by bi-directional Transformer are further trained through a special designed contrastive learning task to improve representation. Then we perform clustering to utterance-level representations and form topic-level clusters that can be considered as vertices in dialogue structure graph. The edges in the graph indicating transition probability between vertices are calculated by mimicking expert behavior in datasets. Finally, dialogue structure graph is integrated into dialogue model to perform controlled response generation. Experiments on two popular open-domain dialogue datasets show our model can generate more coherent responses compared to some excellent dialogue models, as well as outperform some typical sentence embedding methods in dialogue utterance representation. Code is available in GitHub.

* 12 pages, to be published in The Web Conference 2023

Via

Access Paper or Ask Questions