Alert button
Picture for Ruifeng Xu

Ruifeng Xu

Alert button

A Benchmark for Text Expansion: Datasets, Metrics, and Baselines

Sep 17, 2023
Yi Chen, Haiyun Jiang, Wei Bi, Rui Wang, Longyue Wang, Shuming Shi, Ruifeng Xu

This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifiers into proper locations of the plain text to concretize or vivify human writings. Different from existing insertion-based writing assistance tasks, TE requires the model to be more flexible in both locating and generation, and also more cautious in keeping basic semantics. We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references for both English and Chinese. To facilitate automatic evaluation, we design various metrics from multiple perspectives. In particular, we propose Info-Gain to effectively measure the informativeness of expansions, which is an important quality dimension in TE. On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines, especially in expansion informativeness. Experiments verify the feasibility of the TE task and point out potential directions for future research toward better automatic text expansion.

Viaarxiv icon

Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever

Aug 13, 2023
Shijue Huang, Bingbing Wang, Libo Qin, Qin Zhao, Ruifeng Xu

Figure 1 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 2 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 3 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Figure 4 for Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever

Few-shot and zero-shot entity linking focus on the tail and emerging entities, which are more challenging but closer to real-world scenarios. The mainstream method is the ''retrieve and rerank'' two-stage framework. In this paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner, which operates in two layers. The first layer retrieves coarse-grained candidates by leveraging entity names, while the second layer narrows down the search to fine-grained candidates within the coarse-grained ones. In addition, this second layer utilizes entity descriptions to effectively disambiguate tail or new entities that share names with existing popular entities. Experimental results indicate that our approach can obtain superior performance without requiring extensive finetuning in the retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.

* Accepted to NLPCC2023 
Viaarxiv icon

DocDeshadower: Frequency-aware Transformer for Document Shadow Removal

Jul 28, 2023
Shenghong Luo, Ruifeng Xu, Xuhang Chen, Zinuo Li, Chi-Man Pun, Shuqiang Wang

Figure 1 for DocDeshadower: Frequency-aware Transformer for Document Shadow Removal
Figure 2 for DocDeshadower: Frequency-aware Transformer for Document Shadow Removal
Figure 3 for DocDeshadower: Frequency-aware Transformer for Document Shadow Removal
Figure 4 for DocDeshadower: Frequency-aware Transformer for Document Shadow Removal

The presence of shadows significantly impacts the visual quality of scanned documents. However, the existing traditional techniques and deep learning methods used for shadow removal have several limitations. These methods either rely heavily on heuristics, resulting in suboptimal performance, or require large datasets to learn shadow-related features. In this study, we propose the DocDeshadower, a multi-frequency Transformer-based model built on Laplacian Pyramid. DocDeshadower is designed to remove shadows at different frequencies in a coarse-to-fine manner. To achieve this, we decompose the shadow image into different frequency bands using Laplacian Pyramid. In addition, we introduce two novel components to this model: the Attention-Aggregation Network and the Gated Multi-scale Fusion Transformer. The Attention-Aggregation Network is designed to remove shadows in the low-frequency part of the image, whereas the Gated Multi-scale Fusion Transformer refines the entire image at a global scale with its large perceptive field. Our extensive experiments demonstrate that DocDeshadower outperforms the current state-of-the-art methods in both qualitative and quantitative terms.

Viaarxiv icon

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Jul 14, 2023
Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu

Figure 1 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 2 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 3 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
Figure 4 for MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.

* Accepted by ACL2023 Findings 
Viaarxiv icon

A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers

Jun 06, 2023
Xiaoyan Zhao, Yang Deng, Min Yang, Lingzhi Wang, Rui Zhang, Hong Cheng, Wai Lam, Ying Shen, Ruifeng Xu

Figure 1 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 2 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 3 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 4 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers

Relation extraction (RE) involves identifying the relations between entities from unstructured texts. RE serves as the foundation for many natural language processing (NLP) applications, such as knowledge graph completion, question answering, and information retrieval. In recent years, deep neural networks have dominated the field of RE and made noticeable progress. Subsequently, the large pre-trained language models (PLMs) have taken the state-of-the-art of RE to a new level. This survey provides a comprehensive review of existing deep learning techniques for RE. First, we introduce RE resources, including RE datasets and evaluation metrics. Second, we propose a new taxonomy to categorize existing works from three perspectives (text representation, context encoding, and triplet prediction). Third, we discuss several important challenges faced by RE and summarize potential techniques to tackle these challenges. Finally, we outline some promising future directions and prospects in this field. This survey is expected to facilitate researchers' collaborative efforts to tackle the challenges of real-life RE systems.

Viaarxiv icon

A Diffusion Model for Event Skeleton Generation

May 27, 2023
Fangqi Zhu, Lin Zhang, Jun Gao, Bing Qin, Ruifeng Xu, Haiqin Yang

Figure 1 for A Diffusion Model for Event Skeleton Generation
Figure 2 for A Diffusion Model for Event Skeleton Generation
Figure 3 for A Diffusion Model for Event Skeleton Generation
Figure 4 for A Diffusion Model for Event Skeleton Generation

Event skeleton generation, aiming to induce an event schema skeleton graph with abstracted event nodes and their temporal relations from a set of event instance graphs, is a critical step in the temporal complex event schema induction task. Existing methods effectively address this task from a graph generation perspective but suffer from noise-sensitive and error accumulation, e.g., the inability to correct errors while generating schema. We, therefore, propose a novel Diffusion Event Graph Model~(DEGM) to address these issues. Our DEGM is the first workable diffusion model for event skeleton generation, where the embedding and rounding techniques with a custom edge-based loss are introduced to transform a discrete event graph into learnable latent representation. Furthermore, we propose a denoising training process to maintain the model's robustness. Consequently, DEGM derives the final schema, where error correction is guaranteed by iteratively refining the latent representation during the schema generation process. Experimental results on three IED bombing datasets demonstrate that our DEGM achieves better results than other state-of-the-art baselines. Our code and data are available at https://github.com/zhufq00/EventSkeletonGeneration.

Viaarxiv icon

Self-Critique Prompting with Large Language Models for Inductive Instructions

May 23, 2023
Rui Wang, Hongru Wang, Fei Mi, Yi Chen, Ruifeng Xu, Kam-Fai Wong

Figure 1 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 2 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 3 for Self-Critique Prompting with Large Language Models for Inductive Instructions
Figure 4 for Self-Critique Prompting with Large Language Models for Inductive Instructions

Numerous works are proposed to improve or evaluate the capabilities of Large language models (LLMs) to fulfill user instructions. However, they neglect the possibility that user inputs may inherently contain incorrect information due to users' false beliefs or malicious intents. In this way, blindly adhering to users' false content will cause deception and harm. To address this problem, we propose a challenging benchmark consisting of Inductive Instructions (INDust) to evaluate whether LLMs could resist these instructions. The INDust includes 15K instructions across three categories: Fact-Checking Instructions, Questions based on False Premises, and Creative Instructions based on False Premises. Our experiments on several strong LLMs reveal that current LLMs can be easily deceived by INDust into generating misleading and malicious statements. Hence we employ Self-Critique prompting to encourage LLMs to not only critique themselves like in previous works but also the users, which show remarkable improvement in handling inductive instructions under both zero-shot and few-shot settings.

Viaarxiv icon

Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

May 19, 2023
Hongru Wang, Rui Wang, Fei Mi, Zezhong Wang, Ruifeng Xu, Kam-Fai Wong

Figure 1 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 2 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 3 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM
Figure 4 for Chain-of-thought prompting for responding to in-depth dialogue questions with LLM

The way and content in which users ask questions can provide insight into their current status, including their personality, emotions, and psychology. Instead of directly prompting the large language models (LLMs), we explore how chain-of-thought prompting helps in this scenario to perform reasoning and planning according to user status, aiming to provide a more personalized and engaging experience for the user query. To this end, we first construct a benchmark of 6 dialogue or question-answering datasets in both English and Chinese, covering 3 different aspects of user status (\textit{including} \textit{personality}, \textit{emotion}, and \textit{psychology}). Then we prompt the LLMs to generate the response regarding the user status as intermediate reasoning processing. We propose a novel demonstration selection strategy using the semantic similarity of intermediate reasoning instead of test queries. To evaluate the effectiveness and robustness of our approach, we conduct extensive experiments with 7 LLMs under zero-shot and one-shot settings. The experimental results show that our approach consistently outperforms standard prompting in terms of both \textit{helpfulness} and \textit{acceptness} across all datasets, regardless of the LLMs used. The code and dataset can be found at \url{https://github.com/ruleGreen/Dialogue\_CoT.git}.

Viaarxiv icon

SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation

May 04, 2023
Xingwei Liang, You Zou, Ruifeng Xu

Figure 1 for SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation
Figure 2 for SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation
Figure 3 for SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation
Figure 4 for SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation

Emotion Recognition in Conversation~(ERC) across modalities is of vital importance for a variety of applications, including intelligent healthcare, artificial intelligence for conversation, and opinion mining over chat history. The crux of ERC is to model both cross-modality and cross-time interactions throughout the conversation. Previous methods have made progress in learning the time series information of conversation while lacking the ability to trace down the different emotional states of each speaker in a conversation. In this paper, we propose a recurrent structure called Speaker Information Enhanced Long-Short Term Memory (SI-LSTM) for the ERC task, where the emotional states of the distinct speaker can be tracked in a sequential way to enhance the learning of the emotion in conversation. Further, to improve the learning of multimodal features in ERC, we utilize a cross-modal attention component to fuse the features between different modalities and model the interaction of the important information from different modalities. Experimental results on two benchmark datasets demonstrate the superiority of the proposed SI-LSTM against the state-of-the-art baseline methods in the ERC task on multimodal data.

Viaarxiv icon