Alert button
Picture for Ying Shen

Ying Shen

Alert button

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters

Nov 19, 2023
Yinghui Li, Zishan Xu, Shaoshen Chen, Haojing Huang, Yangning Li, Yong Jiang, Zhongli Li, Qingyu Zhou, Hai-Tao Zheng, Ying Shen

Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.

* Work in progress 
Viaarxiv icon

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects

Nov 15, 2023
Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang

Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.

* 17 pages, 5 figures, 14 tables 
Viaarxiv icon

Tunable Soft Prompts are Messengers in Federated Learning

Nov 12, 2023
Chenhe Dong, Yuexiang Xie, Bolin Ding, Ying Shen, Yaliang Li

Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources, alleviating privacy concerns that arise from directly sharing local data. However, the lack of model privacy protection in FL becomes an unneglectable challenge, especially when people want to federally finetune models based on a proprietary large language model. In this study, we propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts. These soft prompts, updated and transmitted between the server and clients, assume the role of the global model parameters and serve as messengers to deliver useful knowledge from the local data and global model. As the global model itself is not required to be shared and the local training is conducted based on an auxiliary model with fewer parameters than the global model, the proposed approach provides protection for the global model while reducing communication and computation costs in FL. Extensive experiments show the effectiveness of the proposed approach compared to several baselines. We have released the source code at \url{https://github.com/alibaba/FederatedScope/tree/fedsp/federatedscope/nlp/fedsp}.

* Accepted by EMNLP-23 
Viaarxiv icon

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

Oct 08, 2023
Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang

Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge -- MultiScript, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MultiScript covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MultiScript, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.

* 12 pages, 9 figures, 4 tables 
Viaarxiv icon

Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Jul 03, 2023
Yinghui Li, Yong Jiang, Shen Huang, Xingyu Lu, Yangning Li, Pengjun Xie, Fei Huang, Hai-Tao Zheng, Ying Shen

Figure 1 for Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking
Figure 2 for Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking
Figure 3 for Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking
Figure 4 for Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Entity Linking (EL) is a fundamental task for Information Extraction and Knowledge Graphs. The general form of EL (i.e., end-to-end EL) aims to first find mentions in the given input document and then link the mentions to corresponding entities in a specific knowledge base. Recently, the paradigm of retriever-reader promotes the progress of end-to-end EL, benefiting from the advantages of dense entity retrieval and machine reading comprehension. However, the existing study only trains the retriever and the reader separately in a pipeline manner, which ignores the benefit that the interaction between the retriever and the reader can bring to the task. To advance the retriever-reader paradigm to perform more perfectly on end-to-end EL, we propose BEER$^2$, a Bidirectional End-to-End training framework for Retriever and Reader. Through our designed bidirectional end-to-end training, BEER$^2$ guides the retriever and the reader to learn from each other, make progress together, and ultimately improve EL performance. Extensive experiments on benchmarks of multiple domains demonstrate the effectiveness of our proposed BEER$^2$.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

Progressive Multi-task Learning Framework for Chinese Text Error Correction

Jul 03, 2023
Shirong Ma, Yinghui Li, Haojing Huang, Shulin Huang, Yangning Li, Hai-Tao Zheng, Ying Shen

Figure 1 for Progressive Multi-task Learning Framework for Chinese Text Error Correction
Figure 2 for Progressive Multi-task Learning Framework for Chinese Text Error Correction
Figure 3 for Progressive Multi-task Learning Framework for Chinese Text Error Correction
Figure 4 for Progressive Multi-task Learning Framework for Chinese Text Error Correction

Chinese Text Error Correction (CTEC) aims to detect and correct errors in the input text, which benefits human's daily life and various downstream tasks. Recent approaches mainly employ Pre-trained Language Models (PLMs) to resolve CTEC task and achieve tremendous success. However, previous approaches suffer from issues of over-correction and under-correction, and the former is especially conspicuous in the precision-critical CTEC task. To mitigate the issue of overcorrection, we propose a novel model-agnostic progressive multitask learning framework for CTEC, named ProTEC, which guides a CTEC model to learn the task from easy to difficult. We divide CTEC task into three sub-tasks from easy to difficult: Error Detection, Error Type Identification, and Correction Result Generation. During the training process, ProTEC guides the model to learn text error correction progressively by incorporating these sub-tasks into a multi-task training objective. During the inference process, the model completes these sub-tasks in turn to generate the correction results. Extensive experiments and detailed analyses fully demonstrate the effectiveness and efficiency of our proposed framework.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

LTCR: Long-Text Chinese Rumor Detection Dataset

Jun 13, 2023
Ziyang Ma, Mengsha Liu, Guian Fang, Ying Shen

Figure 1 for LTCR: Long-Text Chinese Rumor Detection Dataset
Figure 2 for LTCR: Long-Text Chinese Rumor Detection Dataset
Figure 3 for LTCR: Long-Text Chinese Rumor Detection Dataset
Figure 4 for LTCR: Long-Text Chinese Rumor Detection Dataset

False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)

Viaarxiv icon

A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers

Jun 06, 2023
Xiaoyan Zhao, Yang Deng, Min Yang, Lingzhi Wang, Rui Zhang, Hong Cheng, Wai Lam, Ying Shen, Ruifeng Xu

Figure 1 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 2 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 3 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers
Figure 4 for A Comprehensive Survey on Deep Learning for Relation Extraction: Recent Advances and New Frontiers

Relation extraction (RE) involves identifying the relations between entities from unstructured texts. RE serves as the foundation for many natural language processing (NLP) applications, such as knowledge graph completion, question answering, and information retrieval. In recent years, deep neural networks have dominated the field of RE and made noticeable progress. Subsequently, the large pre-trained language models (PLMs) have taken the state-of-the-art of RE to a new level. This survey provides a comprehensive review of existing deep learning techniques for RE. First, we introduce RE resources, including RE datasets and evaluation metrics. Second, we propose a new taxonomy to categorize existing works from three perspectives (text representation, context encoding, and triplet prediction). Third, we discuss several important challenges faced by RE and summarize potential techniques to tackle these challenges. Finally, we outline some promising future directions and prospects in this field. This survey is expected to facilitate researchers' collaborative efforts to tackle the challenges of real-life RE systems.

Viaarxiv icon

The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning

May 24, 2023
Jingyuan Qi, Zhiyang Xu, Ying Shen, Minqian Liu, Di Jin, Qifan Wang, Lifu Huang

Figure 1 for The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning
Figure 2 for The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning
Figure 3 for The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning
Figure 4 for The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning

Chain-of-Thought prompting (CoT) enables large-scale language models to solve complex reasoning problems by decomposing the problem and tackling it step-by-step. However, Chain-of-Thought is a greedy thinking process that requires the language model to come up with a starting point and generate the next step solely based on previous steps. This thinking process is different from how humans approach a complex problem e.g., we proactively raise sub-problems related to the original problem and recursively answer them. In this work, we propose Socratic Questioning, a divide-and-conquer fashion algorithm that simulates the self-questioning and recursive thinking process. Socratic Questioning is driven by a Self-Questioning module that employs a large-scale language model to propose sub-problems related to the original problem as intermediate steps and Socratic Questioning recursively backtracks and answers the sub-problems until reaches the original problem. We apply our proposed algorithm to the visual question-answering task as a case study and by evaluating it on three public benchmark datasets, we observe a significant performance improvement over all baselines on (almost) all datasets. In addition, the qualitative analysis clearly demonstrates the intermediate thinking steps elicited by Socratic Questioning are similar to the human's recursively thinking process of a complex reasoning problem.

* 15 pages, 12 figure, 2 algorithms 
Viaarxiv icon

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

May 18, 2023
Jingheng Ye, Yinghui Li, Qingyu Zhou, Yangning Li, Shirong Ma, Hai-Tao Zheng, Ying Shen

Figure 1 for CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction
Figure 2 for CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction
Figure 3 for CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction
Figure 4 for CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

It is intractable to evaluate the performance of Grammatical Error Correction (GEC) systems since GEC is a highly subjective task. Designing an evaluation metric that is as objective as possible is crucial to the development of GEC task. Previous mainstream evaluation metrics, i.e., reference-based metrics, introduce bias into the multi-reference evaluation because they extract edits without considering the presence of multiple references. To overcome the problem, we propose Chunk-LEvel Multi-reference Evaluation (CLEME) designed to evaluate GEC systems in multi-reference settings. First, CLEME builds chunk sequences with consistent boundaries for the source, the hypothesis and all the references, thus eliminating the bias caused by inconsistent edit boundaries. Then, based on the discovery that there exist boundaries between different grammatical errors, we automatically determine the grammatical error boundaries and compute F$_{0.5}$ scores in a novel way. Our proposed CLEME approach consistently and substantially outperforms existing reference-based GEC metrics on multiple reference sets in both corpus-level and sentence-level settings. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of our designed evaluation metric.

* Rejected by ACL 2023 with Soundness 4/4/4 and Excitement 4/3.5/3.5 :( 
Viaarxiv icon